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CHAPTER  1 


INTRODUCTION 


1 . 1  Introduct i on 


Recognizing  patterns*  classes,  or  populations  in  sample 
data  is  an  important  part  of  many  problems  of  current 
interest.  Examples  such  as  scene  analysis,  character 
recognition,  and  speech  analysis  are  but  a  few  of  the  many 
areas  in  which  pattern  recognition  techniques  have  been 
utilized.  Pattern  recognition,  as  a  scientific  discipline, 
strives  to  produce  an  automated  procedure  for  assigning  each 
element  of  a  set  of  input  data  to  one  of  a  finite  set  of 
classes(l).  Thus,  a  pattern  recognition  system  should 
reduce  the  quantity  of  data  present  while  retaining  the 
information  carried.  This  reduction  has  become  an 
increasingly  important  factor  as  the  quantity  of  data  made 
available  by  modern  digital  computer  systems  continues  to 
grow.  Without  successful  techniques  for  handling  and 
interpreting  data,  the  sheer  quantity  produced  can  become  a 


burden  rather  than  an  aid 


Pattern  recognition  techniques 


are  recognized  as  providing  useful  approaches  to  solution  of 
this  problem(l).  As  a  result*  such  techniques  have  been 
used  extensively  in  the  design  of  computerized  information 
processing  systems(l). 

The  pattern  recognition  or  classification  model  is 
composed  of  the  following  three  components:  a  transducer*  a 
feature  extractor*  and  a  c lass i f i er ( 2 ) .  The  transducer 
senses  the  input  and  converts  it  into  a  form  suitable  for 
machine  processing.  The  feature  extractor  receives  the 
output  of  the  transducer  and  extracts  a  set  of  feature 
measurements  which  represent  the  nature  of  the  data. 
Finally*  these  feature  measurements  are  received  by  a 
classifier  which  assigns  the  input  data  elements  to  one  of 
the  possible  classes. 

Each  of  the  components  described  above  is  dependent  to  a 
varying  degree  on  the  particular  problem  being  considered. 
The  design  or  specification  of  a  suitable  transducer  is 
highly  problem  dependent  and  is  not  considered  in  this 
report.  In  general*  of  the  remaining  two  components,  the 
problem  of  feature  extraction  is  much  more  problem  dependent 
than  that  of  classification.  Many  useful  techniques  for 
feature  extraction  exist*  some  of  which  are  discussed  in 
(1*2).  Mhile  the  problem  of  feature  extraction  is  not 
considered  specifically  in  this  study*  it  is  important  to 
realize  the  connection  between  it  and  the  classification 
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problem.  The  better  the  input  is  represented  by  the  feature 
extractor*  the  easier  the  classification  task  becomes(2). 


1.2  Approaches  to  Classifier  Design 


The  approaches  used  to  design  automatic  pattern 
recognition  systems  can  be  divided  into  three  categor i es ( 1 ) . 
These  categories  are  template  matching*  decision  theoretic 
methods,  and  syntactic  recognition.  Template  matching  is 
based  on  the  idea  of  comparison  of  input  samples  to  a  set  of 
stored  templates  which  represent  each  of  the  possible 
classes.  The  decision  theoretic  techniques  attempt  to 
formulate  a  set  of  classification  rules  which  are  defined  by 
a  function  of  the  sample  features.  The  third  technique* 
syntactic  pattern  recognition*  suggests  that  the  sample 
patterns  can  be  represented  using  a  hierarchical  structure 
present  in  the  data.  Each  of  these  methods*  while  utilizing 
different  procedures*  results  in  some  form  of  decision  rule 
for  data  classification. 

The  template  matching  approach  is  based  on  a  comparison 


technique.  That  is*  an  unknown  sample  is  compared  to  a  set 
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of  templates  stored  in  the  system  until  a  match  is  obtained. 
An  application  where  temp lace  matching  has  been  utilized 
successfully  is  that  of  character  recogn i t i on ( 1 ) .  By 
limiting  the  form  of  the  characters  under  consideration, 
such  as  typed  characters  of  uniform  size,  the  problem  can  be 
reduced  to  a  manageable  form.  Typically,  a  set  of 
measurements  which  allow  unique  representation  of  each 
character  allowed  are  available  to  the  system.  Then,  an 
input  sampleCcharacter )  is  examined  and  the  same  set  of 
measurements  are  recorded.  As  a  result,  all  that  is 
necessary  to  classify  the  sample  is  to  compare  the 
measurements  obtained  to  those  stored  in  the  system  until  a 
match  is  found.  Clearly,  this  technique  places  exact 
restrictions  on  the  samples  under  cons i derat i on( 1 )  .  Also, 
if  the  number  of  characters  allowed  is  very  large  the 
storage  requirements  may  be  burdensome  and  the  time  required 
to  search  for  a  match  will  be  excessive. 

The  decision  theoretic  approach  may  be  subdivided  into 
either  deterministic  or  probabilistic  techniques.  The 
deterministic  techniques  utilize  analytic  functions  to 
provide  a  functional  description  of  the  decision  rule(2). 
As  an  example  of  a  deterministic  pattern  classifier, 
consider  the  1-nearest  neighbor  pattern  classifier.  The 
nearest  neighbor  classifier  finds  the  nearest  labeled 
sample(i.e.  of  known  class)  using  a  distance  measure.  The 
distance  measure  can  be  of  varying  type;  the  Euclidian 


distance  is  one  commonly  used.  Once  the  nearest  neighbor 
has  been  found*  the  sample  pattern  is  assigned  to  the  class 
of  this  closest  neighbor.  Ties  are  resolved  arbitrarily. 

The  probabilistic  mathematical  techniques  utilize  the 
statistical  properties  of  the  pattern  classes  to  achieve  a 
decision  rule(2).  A  probability  density  function  describing 
the  distribution  of  the  class  is  obtained  and  used  to 
formulate  the  decision  rule.  One  example  of  this  method  is 
the  Bayes  classifier.  The  Bayes  classifier  is  typically 
used  when  the  density  functions  are  assumed  to  be 
multivariate  normal  (i.e.  the  data  is  normally 
d i str i buted ) ( 1 ) .  The  mean  and  covariance  matrix 
corresponding  to  the  classes  under  consideration  are 
obtained  by  either  direct  calculation  or  an  approximation 
technique.  With  these  parameters*  the  normal  density 
function  is  completely  defined.  The  density  function  for 
each  class  is  then  evaluated  for  the  sample  pattern  under 
consideration*  these  values  are  combined  with  the 
probability  of  occurrence  of  each  class*  and  the  sample  is 
then  assigned  to  the  class  for  which  the  resultant  value  is 
a  maximum.  As  with  the  nearest  neighbor  classifier*  ties  are 
resolved  arbitrarily.  The  mathematical  techniques  are 
generally  not  as  restrictive  as  those  of  the  template 
matching  method.  Nevertheless*  these  techniques  are  also 
dependent  upon  the  application  considered. 


The  syntactic  approach  to  the  pattern  recognition  problem 
utilizes  the  structure  existing  in  the  sample  classes. 
Formal  language  theory  is  applied  to  describe  the  levels  of 
structure  present  in  terms  of  a  particular  grammar.  Of 
course,  this  approach  presupposes  the  existence  of  some  form 
of  recognizable  structure.  As  a  result,  syntactic 
techniques  are  best  applied  to  problems  in  which  the 
structure  present  can  be  characterized  in  some  concise  form. 
Syntatic  techniques  have  been  used  in  pictorial  pattern 
recognition  as  well  as  in  other  areas.  Of  the  three 
approaches  discussed,  the  syntactic  approach  is  the  least 
developed.  But,  its  use  has  been  receiving  increased 
attention  recently.  The  theory  of  syntactic  pattern 
recognition  is  covered  extensively  in  (3),  and  (1,2)  provide 
an  introductory  look  at  the  topic. 

The  work  presented  here  considers  mathematical  methods 
for  classifier  design.  More  specifically,  an  investigation 
into  algorithms  based  on  fuzzy  set  theory  is  presented  with 
comparisons  to  their  crisp  analogs.  The  algorithmic  pattern 
recognition  techniques  discussed  are  deterministic  in 
nature,  except  for  one  probabilistic  method  based  on  Bayes 
decision  theory.  Of  course,  since  the  lines  which  separate 
the  three  approaches  are  not  hard  and  fast,  it  is  important 
to  be  able  to  draw  from  any  of  the  three  when  developing  an 
effective  classifier. 
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1.3  Introduction  to  Fuzzy  Sets 


The  theory  of  fuzzy  sets  was  developed  by  Lofti  Zadeh  in 
1965(4).  The  impetus  behind  the  introduction  of  the  fuzzy 
set  was  to  provide  a  means  of  defining  categories  which  are 
inherently  i mprec i se ( 5 ) .  While  it  is  a  relatively  new 
concept*  the  theory  is  a  natural  extension  of  traditional 
set  theory.  Since  the  introduction  of  fuzzy  set  theory*  the 
terms  "hard"  and  "crisp"  have  been  used  to  describe  sets 
conforming  to  the  traditional  set  theory.  Although  it  has 
taken  some  time  for  its  use  to  spread*  the  theory  of  fuzzy 
sets  has  been  applied  successfully  to  a  variety  of  areas. 
These  include  medical  diagnosis*  linguistic  modelling* 
artificial  intelligence*  and  scene  analysis  as  well  as 
pattern  recognition.  The  results  achieved  in  these 
applications  are  useful  and  have  stimulated  further  research 
in  the  area. 

Prior  to  the  introduction  of  fuzzy  sets*  probability 
theory  was  the  primary  mathematical  means  of  describing 
imprecision.  Although  many  people  still  believe  that 
probability  theory  is  all  tha+-  is  needed  to  handle  problems 
which  are  inherently  imprecise*  failure  to  examine  all 
possible  methods  of  achieving  a  solution  will  very  likely 


lead  to  a  less  than  optimal  solution.  Upon  comparison  of 
the  imprecision*  or  fuzziness  which  is  modelled  by  fuzzy  set 
theory  to  the  randomness  which  probability  theory  models  so 
well*  it  should  be  clear  that  the  two  theories  are  distinct. 
Consider  the  statement:  "You  are  nearly  correct”.  Using 
probability  theory  this  would  be  modelled  as:  "There  is  an 
XX  possibility  that  you  are  correct. "(X  could  be  something 
near  90).  But  the  intent  of  such  a  statement  is  to  say  that 
the  response  you  supplied  is  close  to  the  correct  one*  not* 
as  probability  theory  suggests*  that  there  is  a  good  chance 
you  are  correct.  Alternatively*  fuzzy  set  theory  models  the 
statement  as:  "The  correctness*  on  a  scale  from  zero  to  one* 
of  your  answer  is  X."(X  could  be  near  0.9).  Now  this  is  the 
true  intent  of  the  statement  given  above.  Thus*  the 
difference  between  fuzzy  modelling  and  stochastic  modelling 
is  that  fuzzy  set  theory  handles  imprecision  easily  whereas 
probab i 1 i ty' theory  is  best  suited  to  random  pr ocesses ( 5 ) . 
So,  it  is  not  a  matter  of  which  theory  is  best*  but  instead 
which  theory  is  best  suited  to  the  problem  at  hand. 

The  basis  of  fuzzy  set  theory  is  that  set  elements  may 
take  on  a  membership  other  than  complete 
memb/rsh i p (member sh i p= 1 )  or  non-member sh i p (membersh i p=0 ) . 
Thus*  as  is  often  the  case  in  real  world  situations*  a  set 
may  consist  of  elements  with  varying  degrees  of  similarity. 
^ie  measure  of  similarity  is  assigned  via  a  membership 
function.  In  traditional*  or  "crisp"  set  theory  the 
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membership  function  values  are  restricted  to  zero  or  one. 
But>  in  fuzzy  set  theory  an  element’s  membership  function 
may  take  on  any  value  in  the  closed  interval  I  0 > 1  ]  Thus 


more  flexibility 

results 

by 

using 

fuzzy 

set 

theory 

to 

describe  classes 

and  their 

members . 

For 

a  more 

complete 

discussion  of  the  theory 

of 

fuzzy 

sets,  the 

reader 

i  s 

referred  to  (6)  which  provides  an  excellent  and  thorough 
presentation  of  the  theory. 


The  following  examples  illustrate  the  usefulness  of  fuzzy 


j  sets  for 

descr i b i ng 

classes  which 

ex  i  st 

i  n 

the 

world. 

Cons i der 

the  class 

of  all  young 

people 

i  n 

the 

world. 

Clearly,  we  must  define  what  attribute  a  person  must  possess 
to  be  considered  young  before  determining  who  belongs  in  the 
class.  One  might  say  that  a  person  is  young  if  they  are 
less  than  some  particular  age.  Then,  using  crisp  set  theory 
to  describe  this  class  we  simply  say  that  everyone  less  than 
the  given  age  is  young  and  all  others  are  not.  On  the  other 
hand,  using  fuzzy  set  theory  we  assign  a  membership  in  the 
class  of  young  people  to  all  persons  considered.  Thus, 
using  age  to  define  young  people,  a  five  year  old  person 
might  have  a  membership  of  0.95  in  the  class  while  a  ninety 
year  old  person  might  have  a  membership  of  0.1  in  the  class. 
Clearly,  the  latter  description  provides  more  information  to 
the  observer  and  as  a  result  should  be  more  useful  to 
someone  concerned  with  young  people  of  the  world.  As 


another  illustration 


consider  the  classic  example  of  the 
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set  of  all  bald  persons.  As  before  we  must  define  what 
attribute  a  person  must  posses  to  be  considered  a  member  of 
the  set.  Should  a  person  whose  hairline  has  receded  a 
couple  of  inches  be  a  member?  Of  course  a  person  with  no 
hair  will  be  a  member.  But  where  do  we  set  the  defining 
line  for  baldness?  Without  going  any  further  with  this 
example  it  should  be  clear  that  crisp  set  theory  will  not 
provide  much  help  in  identifying  the  set  of  bald  people 
unless  we  all  can  agree  at  what  stage  a  person  is  considered 
bald. 

With  just  the  two  examples  presented  above  it  should  be 
clear  that  there  are  many  cases  in  the  world  where  the 
models  based  on  crisp  set  theory  fall  short  of  providing  a 
useful  description  of  things,  people,  or  places.  So,  as 
Professor  Zadeh  proposed,  the  use  of  fuzzy  set  theory  may 
indeed  perform  better  in  these  cases. 
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CHAPTER  2 


FUZZY  CLUSTERING 


2. 1  Introduct i on 


As  discussed  in  chapter  one,  evaluation  of  large  data 
sets  can  be  a  difficult  task  simply  because  of  the  volume  of 
data  present.  One  way  of  reducing  the  data  is  to  use  a 
clustering  procedure  to  extract  information  from  the  raw 
data.  Roughly  speaking,  clustering  procedures  yield  a  data 
description  in  terms  of  clusters,  or  groups  of  data  points 
which  possess  some  form  of  s i m i lar i ty ( 2  ) . 

When  the  clustering  procedures  are  based  on  crisp  set 
theory  a  sample  in  the  data  set  must  be  classified  as 
belonging  to  one  and  only  one  cluster(l).  This  constraint 
is  imposed  by  the  mathematical  model  based  in  crisp  set 
theory.  As  an  example,  consider  the  case  of  a  set  of  data 
samples  taken  from  three  classes,  one  being  a  hybrid  of  the 
other  two  classes  resembling  each  non-hybrid  to  the  same 


degree.  If  we  have  no  prior  knowledge  of  the  actual  number 
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of  classes  present  and  partition  the  data  into  two  clusters, 
the  following  should  occur.  If  the  two  non-hybrid  classes 
are  separable,  samples  taken  from  these  classes  will  be 
placed  in  different  clusters.  But,  what  of  the  samples  from 
the  hybrid  class?  These  samples  will  likely  be  divided  into 
the  two  classes  resulting  in  clusters  which  are  distorted 
from  their  natural  shape  and  density.  In  addition,  the 
samples  from  the  hybrid  data  will  be  lost  amongst  the  other 
samples  with  nothing  to  indicate  a  difference  in  their 
origin. 

Alternatively,  a  fuzzy  clustering  technique  does  not  have 
the  same  constraint  as  that  imposed  by  crisp  set  theory. 
Instead,  samples  are  assigned  membership  in  all  classes(5). 
Returning  to  the  example  problem  described  above,  the  two 
non-hybrid  classes  will  have  high  member sh i p ( c lose  to  one) 
in  one  cluster  and  low  membershi p(close  to  zero)  in  the 
second  cluster.  Of  course,  each  non-hybrid  will  have  high 
membership  in  different  clusters.  Now,  consider  the  samples 
from  the  hybrid  class.  Since  they  do  not  resemble  one 
non-hybrid  more  than  another  they  will  be  assigned 
membership  in  each  cluster  very  close  to  one-half.  Thus, 
these  samples  will  be  recognized  as  not  belonging  to  one 
cluster  more  than  another,  as  they  should  be.  This  example 
points  out  the  essence  of  fuzzy  clustering.  That  is,  fuzzy 
clustering  procedures  do  not  force  a  sample  into  one  and 
only  one  cluster.  Instead,  a  sample’s  "degree  of  belonging" 
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in  a  particular  cluster  can  be  interpreted  via  its 
Membership  assignments. 


In  both  cases  of  the  example  given  above*  as  in  most 
clustering  procedures,  the  technique  in  both  crisp  and  fuzzy 
methods  is  to  assign  individual  data  points  to  a  cluster 
such  that  the  resulting  clusters  produce  a  natural  grouping 
of  the  data.  Of  course,  we  must  define  what  is  meant  by  a 
natural  grouping.  Typically  this  is  defined  by  a  measure  of 
similarity  between  samples  as  well  as  a  criterion  for 
evaluating  the  partition  which  results  from  the  clustering 
procedure(2) .  Thus,  the  choice  of  similarity  measure  and 
criterion  function  in  a  clustering  procedure  strongly 
influences  the  type  of  clusters  obtained. 


2.2  Similarity  Measures 


I 

I 

i 

w 

i 


The  similarity  measure  used  in  a  clustering  procedure 
defines  what  mathematical  properties  of  the  data  should  be 
used  to  identify  clusters(5).  Properties  such  as  distance, 
angle,  curvature,  symmetry,  and  intensity  are  some  which  may 


be  of  interest.  Clearly,  no  one  measure  of  similarity  will 
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be  universally  applicable.  Often  the  choice  of  one  measure 
over  another  is  a  subjective  one*  with  considerations  of 
prior  knowledge  and  ease  of  i mplemetat i on  playing  a  role. 

The  most  obvious  measure  of  similarity  between  two 
samples  is  the  distance  between  them(2).  Of  course  there 
are  several  ways  in  which  the  distance  between  two  points 
can  be  defined.  The  Euclidian  distance  squared  between  two 
sample  vectors  X*  and  Zj 


D*  =  C X#-Z* )*(X,-Zi  ) 


is  one  commonly  used  measure  of  similarity.  with  a  smaller 
distance  corresponding  to  a  greater  similarity.  Use  of  the 
Euclidian  distance  to  test  similarity  in  a  clustering 
procedure  produces  clusters  which  are  hyperspher i cal ( 2 ) . 
The  Malahanobis  distance  from  a  sample  vector  X/  to  a  mean 
vector  M  , 


D*  =  (X,-M)*  C-1  (X/-M) 


is  a  useful  measure  of  similarity  when  the  statistical 
properties  of  the  data  are  being  considered!  here  C  ' 1  is 
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the  inverse  covariance  matrix  of  the  sample  data).  This 
similarity  measure  produces  clusters  which  are 
hyperel 1 i pso i ds ( 2 ) .  Distance  measures  as  a  form  of 
comparison  are  by  no  means  the  only  useful  similarity 
measures . 


A  nonmetric  similarity  measure  between  two  vectors  X  and 


2, 


X*Z 


S  = 


II XII II  Zll 


represents  the  cosine  of  the  angle  between  the  two  vectors  X 
and  Z.  This  similarity  measure  is  a  maximum  when  the 
vectors  are  oriented  in  the  same  direction  with  respect  to 
the  origin.  Thus*  this  measure  is  useful  when  clusters  tend 
to  align  themselves  along  the  principle  axis(l). 

The  similarity  measures  given  above  are  some  of  those 
commonly  used  in  clustering  problems.  Of  course  many  more 
similarity  measures  exist,  some  of  which  are  discussed  in 
(1,2,5).  For  the  pattern  recognition  algorithms  considered 
in  this  report,  the  Euclidian  distance  measure  is  used. 
This  similarity  measure  was  chosen  since  on  the  whole  very 
little  prior  knowledge  concerning  the  types  of  clusters  to 
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expect  in  the  test  data  was  available.  In  addition*  by 
using  this  measure  exclusively*  variability  in  results  due 
solely  to  the  use  of  different  distance  measures  was 
el i mi nated . 


2.3  Criterion  Functions 


In  order  to  obtain  the  set  of  K  clusters*  or  subsets  of  a 
sample  set  which  are  the  "most  desirable",  we  need  to  define 
a  criterion  function  which  measures  the  quality  of  the 
clusters  found.  The  "most  desirable"  clusters  are  those 
which  contain  samples  which  are  somehow  more  similar  than 
samples  contained  in  a  different  cluster(2).  Thus*  once 
such  a  criterion  function  is  defined*  partitioning  the  data 
such  that  the  criterion  function  is  an  extreme (max i ma  or 
minima)  will  produce  the  "most  desirable"  clusters 
obtainable  under  the  given  criteria.  Of  course,  the  result 
does  not  necessarily  represent  the  naturally  occurring 
clusters*  if  any*  in  the  set  of  samples.  The  extent  to 
which  the  clusters  obtained  represent  the  naturally 
occurring  clusters  is  dependent  upon  the  particular  choices 


for  a  similarity  measure  and  criterion  function(2). 
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The  most  widely  used  clustering  criterion  function  is  the 
sum-of-squared-error  cr i ter i on ( 1  )  .  Let  n#  be  the  number  of 
samples  in  a  proposed  cluster  and  let  m*  be  the  mean  of 
those  sample s. 


1  n, 

m#  =  —  £  x j 

n#  j  =  1 


The  sum  of  squared  errors  is  defined  as. 


k  nj 

J.  =  £  £  II  x #  -  m j  II 2 

j  =  0  i  =  0 

where  K  is  the  number  of  clusters  and  nj  is  the  number  of 
vectors  in  the  jth  cluster.  The  i nterpretat i on  of  this 
criterion  function  is  as  follows.  For  a  given  cluster,  the 
mean  vector  mj  is  the  best  representative  of  the  samples  in 
the  cluster  in  the  sense  that  it  minimizes  the  sum  of 
squared  lengths  of  the  "error"  vector  llxf-mjll  (2).  As  a 
result,  J,  measures  the  total  squared  error  incurred  by 
representing  the  n  samples  by  the  K  cluster  centers.  Then 
the  optimal  partition  as  defined  by  this  criterion  function 
is  one  which  minimizes  J*  .  Clusters  resulting  from  the  use 
of  the  sum-of- squared-er ror  as  the  criterion  function  are 


often  called  "minimum  variance"  part i t i ons ( 2 ) .  The  fuzzy 
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analog  to  the  sum-of-squared-error  criterion  is  very  much 
like  the  one  given  above.  The  difference  is  that  the 
distance  measure  for  each  vector  x  is  multiplied  by  x's 
membership  in  the  class  raised  to  the  power  m ,  where  m  is  a 
weighting  factor  usually  taken  as  two. 


The  clustering  problems  best  suited  to  the  use  of  Jr  are 
those  which  form  well  separated  compact  "clouds".  One 
problem  arising  from  the  use  of  this  criterion  when  there  is 
a  large  difference  in  the  number  of  samples  in  different 
clusters  is  that  the  large  clusters  may  be  split  because  of 
the  small  reduction  in  squared  error  being  multiplied  by 
many  times  in  the  sum(5).  Situations  producing  such  a 
problem  often  occur  when  there  exists  single  points  well 
away  from  the  more  dense  regions  of  the  cluster. 


There  are  other  useful  criterion  functions,  several  of 
which  are  discussed  in  (2).  The  common  feature  of  the 
criterion  function  presented  above  as  -well  as  those  in  (2) 
is  that  they  model  the  clustering  problem  as  one  in  which 
the  samples  form  well  separated  "clouds"  of  points.  While 
this  model  may  be  reasonable  in  some  cases  it  does  not 
represent  the  majority  of  the  clustering  problems  which  are 
of  concern.  As  a  simple  example  consider  the  case  of  the 
"cloud  within  a  cloud”,  a  dense  cluster  embedded  in  the 
center  of  a  diffuse  cluster.  Clearly,  utilizing  a  criterion 


which  uses  the  model  described  above  will  not  likely  produce 


a  useful  partition.  Nevertheless,  criterion  such  as  the 
minimum  squared  error  function  are  often  used  as  a  starting 
point,  then  a  different  criterion  function  must  be  devised 
if  the  results  are  not  meaningful. 


2.4  Clustering  Methods 

2.4.1  Hierarchical  Methods 


This  group  of  methods  Mere  originally  utilized  in  the 
field  of  biological  taxonomy  where  individuals  are  grouped 
into  species,  species  into  genera,  genera  into  families,  and 
so  on(2).  Hierarchical  clustering  contains  both 
agglomerat i ve (merg i ng  )  and  d i v i s i ve ( sp 1 i tt i ng  )  techniques. 
In  both  cases  the  procedure  is  to  form  new  clusters  by 
reallocating  membership  of  one  point  at  a  time,  based  on  a 
given  similarity  measure(5).  Thus,  the  resulting  clusters 
form  a  hierarchy  of  nested  clusters.  Because  of  their 
conceptual  and  computational  simplicity,  hierarchical 
methods  are  among  the  best  known(2).  They  are  suitable  for 
use  Mhen  the  underlying  structure  of  the  data  is 
dendritic(5).  An  introductory  look  at  the  methods  of 
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hierarchical  clustering  is  presented  in  (2).  In  addition,  a 
discussion  of  fuzzy  hierarchical  clustering  techniques  is 
presented  in  (7). 


2.4.2  Gr aph- Theoret i c  Methods 


In  this  group  the  set  of  samples  is  regarded  as  a  node 
set,  and  edge  weights  between  pairs  of  nodes  can  be  based  on 
a  similarity  measure  between  pairs  of  nodes(5).  The 
clustering  criterion  may  be  some  measure  of  connectivity 
between  groups  of  nodes.  Breaking  of  edges  in  a  minimal 
spanning  tree  to  form  subgraphs  is  an  often  used 
graph- theoret i c  clustering  strategy(5).  The  benefit  of 
graph  theoretic  techniques  is  that  they  allow  consideration 
of  more  intricate  structures  than  the  isolated  "cloud-like" 
clusters  produced  by  the  mathematics  of  normal  mixtures  and 
mi n i mum- var 1 ance  part i t i ons ( 2  ) . 


2.4.3  Objective  Function  Methods 
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These  methods  generally  allow  the  most  precise(but  not 
necessarily  more  valid)  formulation  of  the  clustering 
cr i ter i on ( 5) .  Objective  function  methods  make  use  of 
criterion  functions#  such  as  those  described  above#  as  a 
measure  of  each  clustering  candidates  "desirability".  Thus# 
the  optimum  clusters  under  these  methods  are  those  which 
produce  local  extrema  of  the  objective  function(5).  The 
K-means  algorithm  described  in  the  following  section  is  of 
this  type. 


2.5  K-means  Algorithm 


This  algorithm#  in  both  the  fuzzy  and  crisp  versions  is 
based  on  the  minimization  of  the  within-group  sum  of  squared 
error  criterion.  Both  the  fuzzy  and  crisp  algorithms  are 
given  below.  The  crisp  K-means  algorithm  is  included  to 
provide  a  comparison  between  fuzzy  and  crisp  clustering 
results.  The  notation  used  in  the  algorithms  is  as  follows. 


K  =  number  of  clusters  specified 
n  =  number  of  data  samples 
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n#  s  number  of  sample  vectors  in  the  ith  cluster 
{X}  =  the  set  of  n  sample  vectors 
m  s  weighting  factor 

U2  s  membership  function  array  for  the  1th  iteration 

u ii  s  the  membership  of  the  jth  vector 
in  the  ith  cluster 

{V2}  =  the  set  of  K  fuzzy  cluster  centers 
for  the  1th  iteration 

{Z2}  =  the  set  of  crisp  cluster  means 
for  the  1th  iteration 

IIAII  =  any  matrix  normtfor  example  the  max  of 
the  absolute  values  of  all  elements) 


The  procedures  for  the  fuzzy  K-means  algorithm  are, 


BEGIN 

Set  K,  2SK<n 
Set  m  ,  c20 
Set  m,  H«<» 

Initialize  U° 

Initialize  1=0 

DO  UNTIL  (  II U2  - U2  -  *  It  <  e  ) 

Increment  1 

Calculate  {V#2}  using  2.5a  and  U2 ~ 1 
Compute  U2  using  2.5b  and  {V*2} 

END  DO  UNTIL 
END 


n 

Z  <u 
3=1 

2.5a  V#  =  - 

n 

Z  (u u)m 
i=  l 


i/llx^-v/ll*'«-->  > 
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E(  l/ll  x>-  v*ll  )*'«•-»> 
k=  1 


The  crisp  K-means  algorithm  is  as  follows. 


BEGIN 

Set  K,2SKSn 

Initialize  {Z#*}  by  arbitrary  assignment 
as  vectors  in  the  sample  set 
Initialize  1=0 

DO  UNTIL  (  IIZ,*-Z#2-MI<«r.  V  i  =  1,2, ...,K  ) 
Increment  1 

Assign  each  xc{X)  to  the  jth  cluster  if 
II  x-Zj1  ~  1 II  <11  x-  Z* 1 '  1 II  V  i  =  1 ,  2  >  .  .  .  ,  K 
Compute  iZi1}  using  2.5c 
END  DO  UNTIL 
END 


1  n, 

2.  5c  Z,  =  -  £  x j 

n i  j  =  l 


As  the  statements  of  the  algorithms  illustrate,  both  are 
relatively  simple  procedures.  Although  neither  of  these 
algorithms  have  a  general  convergence  proof  associated  with 
them,  they  both  have  been  shown  to  provide  useful 
results( 1, 5) .  In  the  case  of  the  fuzzy  K-means  algorithm,  a 
proof  of  convergence  under  certain  conditions  to  a  local 
minimum  of  the  within- group  sum  of  squared  error  criterion 


ex i sts( 5) . 
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While  both  algorithms  are  useful  in  determining  the 
existence  of  a  set  of  K  clusters  in  some  data  sets,  the 
correct  choice  of  K  is  no  stra i ghtf orward  task.  Often  the 
data  must  be  run  for  several  values  of  K.  Then  the  results 
of  all  runs  must  be  interpreted  (usually  by  hand)  to 
determine  the  number  of  naturally  occurring  clusters,  if 
any.  One  advantage  of  the  fuzzy  K-means  algorithm  is  that 
the  interpretation  is  eased  by  the  availability  of  the 
membership  function  array.  When  the  memberships  show  most 
samples  with  a  high  membership  in  only  one  cluster  then  this 
suggests  the  choice  for  K  which  produced  the  results  may 
best  represent  the  number  of  naturally  occurring  clusters. 

The  initialization  steps  required  for  these  algorithms 
are  quite  different.  For  the  crisp  K-means  algorithm  a  set 
of  K  initial  cluster  centers  must  be  chosen.  Usually  a 
random  assignment  will  produce  good  results.  Alternatively, 
the  fuzzy  K-means  algorithm  requires  a  little  more  effort  in 
order  to  obtain  useful  results.  The  initial  membership 
function  array  can  not  in  general  be  assigned  arbitrarily. 
One  procedure  for  initialization  of  the  array  is  to  obtain  a 
crisp  partition  and  then  "fuzzify"  it  by  changing  each 
vectors  memberships  so  that  they  share  membership  among  the 
classes  with  their  maximum  membership  in  the  class  which  the 


crisp  partition  placed  them(5). 
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A  great  deal  of  research  concerning  the  fuzzy  K-means 
algorithm  has  been  conducted.  Several  individual  as  well  as 
joint  efforts  have  been  completed  by  James  Bezdek  and  Joseph 
Dunn ( 5, 8# 9, 10 ) .  On  the  whole.  as  indicated  by  their 
research  results.  the  fuzzy  K-means  algorithm  is  a  useful 
tool  for  cluster  analysis.  Additional  results  and  a 
comparison  between  the  fuzzy  and  crisp  algorithms  are  given 


in  chapter  four. 
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FUZZY  CLASSIFIERS 


3. 1  Introduct i on 


Hhile  clustering  procedures  are  utilized  when  the  nature 
of  a  set  of  unlabelled  samples  is  being  investigated, 
classification  routines  have  a  different  purpose.  Given  a 
set  of  unlabelled  samples,  a  classification  algorithm  should 
be  able  to  determine  their  correct  classification.  There 
are  several  approaches  to  the  classifier  problem.  as 
discussed  in  chapter  one.  In  this  chapter  the  nearest 
neighbor  and  nearest  prototype  classifiers  are  considered. 

Both  the  nearest  neighbor  and  nearest  prototype 
classifiers  utilize  labelled  samples  and  a  distance  measure 
to  determine  classification.  In  the  case  of  the  nearest 
neighbor  classifier  the  labelled  samples  are  used  directly. 
The  nearest  prototype  classifier  compares  the  samples  of 
unknown  class  to  a  set  of  prototypical  samples  representing 
the  possible  classes. 


3.2  Nearest  Neighbor  Classifiers 
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The  nearest  neighbor  classifiers  require  no  preprocessing 
of  the  labelled  sample  set  prior  to  their  use.  The  crisp 
nearest  neighbor  classification  rule  assigns  an  input  sample 
vector  y,  of  unknown  classification,  to  the  class  of  its 
nearest  neighbor(l).  This  idea  can  be  extended  to  K  nearest 
neighbors  with  the  vector  y  being  assigned  to  the  class 
which  is  represented  by  a  majority  amongst  the  K  nearest 
neighbors.  Of  course.  when  more  than  one  neighbor  is 
considered  the  possibility  that  there  will  be  a  tie  among 
classes  which  have  a  maximum  number  of  neighbors  in  the 
group  of  K  nearest  neighbors  exists.  One  simple  way  of 
handling  this  problem  is  to  restrict  the  possible  values  of 
K.  For  example,  given  a  two  class  problem,  if  we  restrict  K 
to  odd  values  only  no  tie  will  be  possible.  Of  course,  when 
more  than  two  classes  are  possible  this  technique  is  not 
useful.  The  means  of  handling  the  occurrence  of  a  tie  is  as 
follows.  The  sample  vector  is  assigned  to  the  class,  of 
those  classes  which  tied.  for  which  the  sum  of  distances 
from  the  sample  to  each  neighbor  in  the  class  is  a  minimum. 
Of  course,  this  could  still  lead  to  a  tie.  in  which  case  the 
assignment  is  to  the  last  class  encountered  amongst  those 
which  tied,  an  arbitrary  assignment.  Clearly,  there  will  be 
cases  where  a  vector’s  classification  becomes  an  arbitrary 
assignment  no  matter  what  additional  procedures  are  included 


in  the  algorithm. 
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3.2.1  Crisp  Nearest  Neighbor  Algorithm 


Let  H  =  {X! , x* , 


Xn> 


be  a  set  of  n  labelled  samples 


algorithm  is  as  follows. 


BEGIN 

Input  y,  of  unknown  classification 
Set  K,  lSKSn 
Initialize  i  =  1 

DO  UNTIL!  K  nearest  neighbors  found) 

Compute  distance  from  y  to  x* 

IF  C i  S  K  )  THEN 

Include  x#  in  the  set  of  K  nearest  neighbors 
ELSE  IF  (  X/  is  closer  to  y  than 

any  previous  nearest  neighbor  )  THEN 
Delete  farthest  in  the  set  of  K  nearest  neighbors 
Include  x#  in  the  set  of  K  nearest  neighbors 
END  IF 
Increment  i 
END  DO  UNTIL 

Determine  the  majority  class  represented  in  the  set 
of  K  nearest  neighbors 
IF  (a  tie  exists  )  THEN 

Compute  sum  of  distances  of  neighbors  in  each 
class  which  tied 
IF  (  no  tie  occurs  )  THEN 

Classify  y  in  the  class  of  minimum  sum 
ELSE 

Classify  y  in  the  class  of  last  minimum  found 
END  IF 
ELSE 

Classify  y  in  the  majority  class 
END  IF 
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3.2.2  Fuzzy  Nearest  Neighbor  Classifier 


While  the  fuzzy  K-nearest  neighbor  procedure  is  also  a 
classification  algorithm  the  form  of  its  results  differ  from 
the  crisp  version.  The  fuzzy  K-nearest  neighbor  algorithm 
assigns  class  membership  to  a  sample  vector  rather  than 
assigning  the  vector  to  a  particular  class.  The  advantage 
is  that  no  arbitrary  assignments  are  made  by  the  algorithm. 
In  addition,  the  vector's  membership  values  should  provide  a 
level  of  assurance  to  accompany  the  resultant 
classification.  For  example,  if  a  vector  is  assigned  0.9 
membership  in  one  class  and  0.05  membership  in  two  other 
classes  we  can  be  reasonably  sure  the  class  of  0.9 
membership  is  the  class  to  which  the  vector  belongs.  On  the 
other  hand,  if  a  vector  is  assigned  0.55  membership  in  class 
one.  0.44  membership  in  class  two.  and  0.01  membership  in 
class  three  then  we  should  be  hesitant  to  assign  the  vector 
based  on  these  results,  although  we  can  feel  confident  that 
it  does  not  belong  to  class  three.  In  such  a  case  the 
vector  might  be  examined  further  to  determine  its 
classification.  Clearly  the  membership  assignments  produced 
by  the  algorithm  can  be  useful  in  the  classification 


process 
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The  basis  of  the  algorithm  is  to  assign  membership  as  a 
function  of  the  vector’s  distance  from  its  K  nearest 
neighbors  and  those  neighbors  membership  in  the  possible 
classes.  The  fuzzy  algorithm  is  similar  to  the  crisp 
version  in  the  sense  that  it  must  also  search  the  labelled 
sample  set  for  the  K  nearest  neighbors.  Beyond  obtaining 
these  K  samples,  the  procedures  differ  considerably. 


3. 2. 2.1  Fuzzy  Nearest  Neighbor  Algorithm 


Let  X=  {xItx2 . x„  }  be  the  set  of  n  labelled  samples. 

Also  let  u#(x)  be  the  assigned  membership  of  the  vector  xCto 
be  computed ), and  u*j  be  the  membership  in  the  ith  class  of 
the  jth  vector  of  the  labelled  sample  set. 


BEGIN 

Input  x.  of  unknown  classification 
Set  K,  12  12  n 
Initialize  i=l 

DO  UNTIL  (  K  nearest  neighbors  to  x  found  ) 

Compute  distance  from  x  to  x/ 

IF  ( i  2  K  )  THEN 

Include  x*  in  the  set  of  K  nearest  neighbors 
ELSE  IF  (  x#  closer  to  x  than 

any  previous  nearest  neighbor  )  THEN 
Delete  the  farthest  of  the  K  nearest  neighbors 
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Include  x*  in  the  set  of  K  nearest  neighbors 
END  IF 

END  DO  UNTIL 
Initialize  i=l 

DO  UNTIL  (  x  assigned  membership  in  all  classes  ) 
Compute  uy  (x)  using  3.2.2.1a  below 


Increment  i 

END  DO  UNTIL 

END 

1  n  1  _  i .  f  w  \  — 

K 

SUyj(l/||x-XjH4/‘*-1  *  ) 
3=1 

■3  .  d  »  c  .  la  U  f  K  X  J  *" 

K 

E(l/l|x-x^ll4/<"-1>  ) 

3=1 

3. 2. 2. 2  Physical  Interpretation 


The  interpretation  of  the  algorithm  is  given  in  terms  of 
the  following  example.  As  stated  previously,  selection  of 
the  K  nearest  neighbors  from  the  labelled  sample  set  is 
straightforward.  So,  with  K=  3  the  example  proceeds  assuming 
the  3  nearest  neighbors  of  x  are  Xj,  Xj,  and  Xj  .  The  class 
memberships  for  these  three  sample  vectors  are  given  as: 


u fj  =  membership  of  jth  sample  in  the  ith  class,  j=l,2,3. 


The  distances  of  x  from  xlt  x2.  and  xs  are  di,  d*.  a 
respect i vely . 

Now  according  to  3.2.2.1a, 

uy !  C  1/d!  )*  1  +  UfaCl/d,)*'— *  +  Uy3  Cl/ds 

U  y  (  X  )  =  - 

3 

ECl/d i  )2 /“" 1 
3  =  1 


Thus,  the  assigned  memberships  of  x  are  influenced  by  the 
inverse  of  the  distances  from  the  nearest  neighbors  and 
their  class  memberships.  The  inverse  distance  serves  to 
weight  a  vector’s  membership  more  if  it  is  closer  and  less 
if  farther  from  the  vector  under  consideration.  The 
labelled  samples  can  be  assigned  class  memberships  in  one  of 
two  ways.  First,  they  can  be  given  complete  membership  in 
their  known  class  and  non-membership  in  all  other  classes. 
The  second  alternative  is  to  assign  the  samples  membership 
based  on  distance  from  their  class  mean  or  distance  from 
labelled  samples  of  the  other  class  or  classes,  and  then  use 
the  resulting  memberships  in  the  classifier.  Both  of  these 
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techniques  have  been  used  in  this  study  and  the  results  are 
reported  in  chapter  four. 


3.3  Nearest  Prototype  Classifiers 


These  classifiers  bear  a  marked  resemblance  to  the 
1-nearest  neighbor  classifier.  Actually,  the  only 
difference  is  that  for  the  nearest  prototype  classifier  the 
labelled  samples  are  a  set  of  class  prototypes  whereas  in 
the  nearest  neighbor  classifier  we  use  a  set  of  labelled 
samples  which  are  not  necessarily  prototypical.  Of  course, 
the  nearest  prototype  classifier  could  be  extended  to 
multiple  prototypes  representing  each  class,  similar  to  the 
K-nearest  neighbor  routine.  Nevertheless  this  study 
considers  only  the  1-nearest  prototype  classifier  in  both  a 
crisp  and  fuzzy  version.  The  prototypes  used  for  these 
routines  are  taken  as  the  class  means  of  the  labelled  sample 
set . 


3.3.1  Crisp  Nearest  Prototype  Algorithm 
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Let  W=  {Zi.  Zj,  ...,Z jrJ  be  the  set  of  K  prototype  vectors 
representing  the  K  classes. 


BEGIN 

Input  x,  vector  to  be  classified 
Initialize  i  =  l 

DO  UNTIL  (distance  from  each  prototype  to  x  computed) 
Compute  distance  from  Z*  to  x 
Increment  i 
END  DO  UNTIL 

Determine  minimum  distance  to  any  class  prototype 
IF  (  tie  exists  )  THEN 

Classify  x  as  last  class  found  of  minimum  distance 
ELSE 

Classify  x  as  class  of  closest  prototype 
END  IF 
END 


3.3.2  Fuzzy  Nearest  Prototype  Algorithm 


As  above,  let  M  =  {Za ,  . . . Zk 1  be  the  set  of  K  prototypes 

representing  the  K  classes. 


BEGIN 

Input  x,  vector  to  be  classified 
Initialize  i  =  1 

DO  UNTIL  (distance  from  each  prototype  to  x  computed) 
Compute  distance  from  Z*  to  x 
Increment  i 


PAGE  35 


END  DO  UNTIL 
Initialize  i=l 

DO  UNTIL  (X  assigned  membership  in  all  classes) 
Compute  u;(x)  using  3.3.2a  below 
Increment  i 
END  DO  UNTIL 
END 


3.3.2a  u#(x)  =  —  •  — —  — — — 

K 

E(l/||  K-Xill*'*  —  1  »  ) 
3=1 


The  difference  between  3.3.2a  and  3.2.2.1a  is  that 
membership  in  each  class  is  assigned  based  only  on  the 
distance  from  the  class  prototype.  This  is  because  the 
prototypes  should  naturally  be  assigned  complete  membership 
in  the  class  which  they  represent. 
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CHAPTER  A 


RESULTS  AND  CONCLUSIONS 


4.  1  Introduct i on 


The  results  presented  in  this  chapter  were  produced  by 
software  implementation  of  the  algorithms  presented  in 
chapters  three  and  four.  The  software  was  developed  using 


Fortran  77  on  a  Perkin-Elmer  3220.  In  addition,  UMC  Core 
Graphics  support  software  was  utilized  to  allow  a  geometric 
interpretation  of  the  two-dimensional  clustering  and 
classification  results. 


4. 2  Test  Data 
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Four  labelled  data  sets  were  utilized  to  test  the 
algorithms.  The  data  sets  and  their  attributes  are  as 
follows . 


Data  Set 
name 


Number  of 
classes 


Number  of  Number  of  features 

vectors  per  vector 


IRIS 

3 

150 

4 

IRIS23 

2 

100 

4 

TMOCLASS 

2 

242 

4 

LANDSAT 

4 

32018 

4 

The  IRIS  data  is  that  of  Ander son ( 1 1 ) .  This  particular 
data  set  has  been  utilized  extensively  by  researchers  in  the 
area  of  cluster  analysis  since  1936,  when  R.A.  Fisher  first 
used  it  to  illustrate  the  concept  of  linear  discriminant 
analysis(5).  The  data  represents  three  subspecies  of 
irises,  with  the  four  feature  measurements  being  sepal 
length,  sepal  width,  petal  length,  and  petal  width,  all  in 
centimeters.  There  are  fifty  vectors  per  class  in  this  data 
set.  The  IRIS23  data  set  is  a  subset  of  the  IRIS  data.  It 


includes  classes  two  and  three*  the  non- separ ab le  classes* 
of  the  IRIS  data. 

The  TMOCLASS  data  set  is  an  artificially  generated 
normally  distributed  set  of  vectors.  This  data  set  was 
included  because  classification  results  from  a  Bayes 
classifier  were  available  to  use  in  the  comparison.  This 
data  set  contains  121  samples  per  class. 

The  third  data  set  is  a  set  of  images  taken  by  Landsat-2 
on  April  22*  1981  and  August  9*  1981.  Features  one  and  two 
were  produced  in  April  and  features  three  and  four  in 
August.  Features  one  and  three  Mere  produced  by  identical 
sensor  types  as  Mere  tMO  and  four.  This  data  set  Mas  used 
exclusively  in  the  clustering  evaluation.  As  Mith  the 
TMOCLASS  data*  results  from  a  different  clustering  procedure 
ran  on  this  data  Mas  available  for  comparison.  For 
additional  information  concerning  the  data  source  refer  to 
(12).  The  clustering  results  available  Mere  produced  by  a 
statistically  oriented  algorithm  entitled  SEARCH  Mhich  is 
also  described  in  (12). 

The  IRIS  data  and  TMOCLASS  data  sets  Mere  utilized  in 
evaluation  of  both  the  clustering  and  classification 
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4.3  Clustering  Results  and  Computational  Requirements 


4.3.1  Clustering  Results 


As  a  basis  for  comparison  the  results  of  the  fuzzy 
clustering  algorithm  are  reported  as  a  crisp  partition 
wherein  a  vector  is  assigned  to  the  cluster  of  maximum 
membership.  With  these  results,  shown  in  Tables  1.  la,  and 
lb.  a  comparison  of  the  crisp  and  fuzzy  algorithms  can  be 
made.  The  percentages  given  in  the  tables  indicate  the  rate 
of  correct  classification,  for  individual  classes  and 
combined  results. 

The  results  are  presented  in  the  form  of  confusion 
matrices.  These  matrices  are  organized  as  follows.  The 
count  of  samples  listed  in  each  row  are  those  which  belong 
to  the  corresponding  class  and  the  count  of  samples  listed 
in  each  column  are  those  placed  into  the  cor respond i ng 
cluster.  Thus,  the  rows  give  the  vectors  in  the 
corresponding  class  and  the  columns  give  the  resultant 


cluster  assignments 
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Consider  first  the  results  shown  for  the  IRIS  and 
TWOCLASS  data  sets.  In  the  case  of  the  TMOCLASS  data  the 
results  are  the  same  for  both  crisp  and  fuzzy  clustering. 
The  results  of  the  two  clustering  procedures  do  show  a 
difference  for  the  IRIS  data*  although  the  difference  in 
error  rate  is  less  than  IX.  hardly  significant. 

Next,  examine  the  results  shown  for  the  LANDSAT  data. 
First  of  all.  the  numbers  in  the  results  of  the  SEARCH 
procedure  differ  by  a  scale  factor  because  they  are  reported 
in  terms  of  acreage  whereas  the  other  results  are  reported 
in  terms  of  pixel  count.  Comparing  only  the  results  of  the 
fuzzy  clustering  to  the  crisp  clustering  it  should  be  clear 
that  the  fuzzy  K-means  algorithm  performed  much  better  than 
the  crisp  version.  Actually,  the  only  reason  the  crisp 
algorithm’s  results  show  an  overall  rate  of  correct 
labelling  above  fifty  percent  is  because  the  majority  of  the 
sample  points  are  from  a  single  class. 

Now,  if  we  compare  the  results  of  the  fuzzy  K-means 
algorithm  to  those  of  the  SEARCH  algorithm  the  following 
observations  can  be  made.  First  of  all  the  overall  rate  of 
correct  labelling  for  SEARCH  is  higher  than  that  of  the 
fuzzy  results.  But,  by  examining  the  results  of  the 
individual  classes  we  can  see  that  the  fuzzy  clustering 
routine  did  better,  on  the  average,  for  the  individual 
classes.  In  addition,  while  the  SEARCH  procedure  is 
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considered  unsupervised  clustering.  it  does  involve  user 
interpretation  of  intermediate  resultsC  a  much  larger  number 
of  clusters),  which  is  then  given  to  the  algorithm  in  terms 
of  desired  cluster  combinations  so  that  the  final  cluster 
count  will  be  as  specified,  which  in  this  case  is  four(12). 

From  the  above  results  it  should  be  clear  that  the  fuzzy 
K-means  performs  as  well  as.  and  in  some  cases  better  than 
the  crisp  K-means  algorithm. 


4.3.2  Role  of  the  Weighting  Factor 


The  weighting  factor  (m)  used  in  the  fuzzy  algorithms 
influences  the  results  of  these  algorithms  in  an  interesting 
manner.  The  results  of  the  K-means  algorithm  when  m  is 
varied  over  a  range  of  values  are  presented  in  Tables  2  and 
2a. 


The  first  thing  to  notice  from  the  results  listed  in 
Table  2  is  that  the  rate  of  correct  classification  increases 
without  exception  for  both  data  sets  as  the  value  of  m  is 


increased  over  the  range.  In  addition,  as  m  is  increased 


the  resulting  memberships  become  "fuzzier",  as  expected(5). 
That  is,  on  the  average,  the  membership  assignments  given 
are  closer  to  0.5,  the  region  where  it  would  seem  the 
membership  of  a  vector  would  be  more  difficult  to 
distinguish.  Nevertheless,  in  the  empirical  results 
presented  in  Tables  2  and  2a,  the  "fuzzier"  memberships  do 
not  cause  the  error  rate  to  increase,  instead  it  decreases. 
While  these  results  are  not  conclusive,  they  do  show  that 
the  fuzzy  K-means  algorithm  can  outperform  the  crisp  K-means 
algor i thm. 


4.3.3  Computational  Requirements 


The  computational  requirements  of  the  crisp  and  fuzzy 
K-means  algorith  will  now  be  considered.  The  number  of 
multiplications  and  additions  are  compared  in  the  general 
case  and  for  a  particular  example.  The  count  of 
multiplications  and  additions  for  each  algorithm  are 
reported  in  terms  of  the  parameters  listed  below. 


K  s  number  of  cluster  specified 
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N  5  number  of  data  samples 

n  =  number  of  features  per  sample 

Fuzzy  Crisp 

Multiplications  KN(3+2n)  Kn(2+N) 

Additions  3KN(l+n)  2Kn(l+N) 


Using  the  parameter  for  the  IRIS  data  set,  the  following 
particular  example  is  given. 


N  =  150 
n  =  4 


Fuzzy 

Crisp 

Multiplications 

4950 

1824 

Add i t i ons 

6750 

3624 

Without  a  doubt  there  is  a  trade-off  involved  when  using 
the  fuzzy  K-means  algorithm  as  opposed  to  the  crisp  K-means 
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algorithm.  But,  the  fuzzy  K-means  algorithm  provides  more 
information,  in  the  form  of  cluster  memberships,  than  the 
crisp  K-means  algorithm. 


4.4  Classifier  Results  and  Computational  Requirements 


As  with  fuzzy  clustering,  results  of  the  fuzzy 
classifications  are  reported  in  terms  of  a  crisp  partition 
wherein  a  sample  vector  is  assigned  to  the  class  of  maximum 
membership.  The  classifications  are  obtained  using  the 
"leave  one  out"  technique.  The  procedure  is  to  leave  one 
sample  out  of  the  data  set  and  classify  it  using  the 
remaining  samples  as  the  labelled  data  set.  This  technique 
is  repeated  until  all  samples  in  the  data  set  have  been 
classified.  In  addition,  in  order  to  evaluate  one  technique 
used  to  initialize  memberships  of  the  labelled  samples  used 
in  the  classifier  the  IRIS23  data  set  was  created  by  using 
only  class  two  and  three  of  the  IRIS  data  set.  This  was 
necessary  because  the  initialization  technique  will  only 
work  on  two  class  classification  problems. 
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4.4.1  Results  of  Nearest  Neighbor  Classifiers 


Before  comparing  the  results  produced  by  the  nearest 
neighbor  algorithms,  the  types  of  labelling  techniques  used 
for  the  fuzzy  classifier  are  explained.  Three  different 
techniques  of  membership  assignment  for  the  labelled  data 
are  considered.  The  first  method,  a  crisp  labelling,  is  to 
assign  each  labelled  sample  complete  membership  in  its  known 
class  and  zero  membership  in  all  other  classes.  The  second 
technique  utilized  assigns  membership  based  on  the  procedure 
presented  in  (13).  This  technique  works  only  on  two  class 
data  sets.  The  procedure  assigns  a  sample  membership  in  its 
known  class  based  on  its  distance  from  the  mean  of  the 
labelled  sample  class.  These  memberships  range  from  one  to 
one-half  with  an  exponential  rate  of  change  between  these 
limits.  The  sample's  membership  in  the  other  class  is 
assigned  such  that  the  sum  of  the  memberships  of  the  vector 
equals  one.  A  more  detailed  explanation  of  this  technique 
is  given  in  (13).  The  third  method  considered  assigns 
memberships  to  the  labelled  samples  according  to  a  K-nearest 
rule.  The  K(not  K  of  the  classifier)  nearest  neighbors  to 
each  sample  (x)  are  found  and  then  membership  in  the  known 


class  i  is  assigned  according  to  the  following  equation. 
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u#  (x)  =  0.51  +  (n*/K)*0. 49 


Membership  assignments  in  the  remaining  classes  are 
according  tot  C  =  number  of  classes). 

ujtx)  =  (nj/K)#0.49  j  =1.2 . C  j* i 

The  value  n*  is  the  number  of  the  neighbors  found  which 
belong  to  the  ith  class  and  the  value  n.*  is  the  number  of  the 
neighbors  found  which  belong  to  the  jth  class.  This  method 
attempts  to  "fuzzify"  the  memberships  of  the  labelled 
samples  which  are  in  the  class  regions  which  intersect  in 
the  sample  space  and  leave  the  samples  which  are  well  away 
from  this  area  with  complete  membership  in  the  known  class. 
As  a  result,  an  unknown  sample  lying  in  this  intersecting 
region  will  be  influenced  to  a  lesser  extent  by  the  labelled 
samples  which  are  in  the  "fuzzy"  area  of  the  class  boundary. 
This  initialization  technique  would  work  better  on  the 
problem  of  the  "cloud  within  a  cloud"  discussed  in  section 
2.3. 


Thus,  with  these  three  initialization  techniques  three 
sets  of  results  of  the  fuzzy  K-nearest  neighbor  classifier 


are  produced. 
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These  results  are  presented  in  Tables  3  and  4.  Upon 
comparison  of  the  results  of  the  crisp  classifier  and  the 
fuzzy  classifier  with  crisp  initialization  ue  can  see  that 
on  the  average  these  procedures  have  equal  error  rates.  In 
addition*  the  fuzzy  classifier  which  uses  the  second 
initialization  technique  described  produced  nearly  equal 
results.  Although  not  reported  in  the  tables*  the  results 
of  this  fuzzy  classifier  using  the  membership  assignment 
rule  described  in  (13)  did  not  produce  memberships  for  the 
m i sc  lass i f i ed  vectors  which  suggest  they  actually  belong  do 
a  different  class.  Instead  this  second  initialization 
technique  causes  an  overall  reduction  in  the  values  of 
memberships  assigned  with  most  of  the  samples  given  majority 
memberships  less  than  0.7.  But  the  nearest  neighbor 
initialization  technique  does  seem  to  produce  membership 
assignments  which  give  an  indication  of  degree  of 
correctness  of  classification. 

Examining  the  results  given  in  Table  4  for  the  K-nearest 
neighbor  classifier  with  nearest  neighbor  sample  membership 
initialization*  the  following  observations  can  be  made. 
First  of  all*  the  results  show  a  somewhat  lower  overall 
error  rate.  But*  more  importantly,  the  number  of 
m i sc  lass i f i ed  vectors  with  high  assigned  membership  in  the 
wrong  class  is  quite  small  for  certain  choices  of  KINIT.  In 


add i t i on 


the  correctly  classified  samples  were  given 
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relatively  higher  membership  in  their  known  class  than  in 
other  classes. 

As  a  final  comparison,  consider  the  results  of  the  Bayes 
classifier  for  the  TMOCLASS  data.  Running  a  ten  percent 
jacknife  procedure(  Taking  ten  percent  of  the  samples  as 
test  data  and  the  remaining  as  training  data,  classifying 
these  and  then  repeating  the  procedure  until  all  samples 
have  been  used  as  test  samples.)  and  assuming  equal  apriori 
probabilities  for  both  classes.  the  Bayes  classifier 
misclassi f ied  twenty  of  the  samples.  Clearly,  dependent  on 
the  value  chosen  for  K.  the  fuzzy  nearest  neighbor 
classifier  can  perform  as  well  as  a  Bayes  classifier. 


4.4.2  Nearest  Neighbor  Computational  Requirements 


The  computational  requirements  of  the  crisp  and  fuzzy 
classifiers  are  now  considered.  The  number  of 
multiplications  and  additions  required  to  classify  a  sample 
are  considered.  The  parameters  which  influence  the  number 
of  multiplications  and  additions  required  are  as  follows. 
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C  =  number  of  classes 

K  s  number  of  neighbors  used  to  classify 
N  =  number  of  labelled  samples  used 
n  s  number  of  features  per  sample  vector 


Mult i pi i cat i ons 
Add i t i ons 


Fuzzy  Crisp 


nN+  C ( 2K+ 1 )  nN 

2nN+K+2CK  2nN+  2K+CK+C- 1 


Using  the  parameters  for  the  IRIS  data  set  and  setting 
,  the  following  particular  example  is  given. 


C  =  3 
N  =  149 
n  =  4 


Fuzzy  Crisp 

Multiplications  617  596 


Add i t i ons 


1213 


1209 


As  this  example  illustrates,  there  is  little  difference 
in  the  computational  requirements  of  the  crisp  and  fuzzy 
K-nearest  neighbor  algorithms. 


4.4.3  Results  of  Nearest  Prototype  Classifiers 


The  1-nearest  prototype  classifier  in  both  the  crisp  and 
fuzzy  versions  are  the  quickest  and  simplest  of  the 
classifiers  considered.  The  reason  is  as  follows.  In  both 
versions  of  the  1-nearest  prototype  algorithm.  an  unknown 
sample  is  compared  to  one  prototype  per  class  as  opposed  to 
the  K-nearest  neighbor  algorithms  wherein  an  entire  set  of 
labelled  samples  representing  each  class  must  be  compared 
before  the  "K"  nearest  are  obtained.  The  results  reported 
in  Table  5  show  that  the  fuzzy  nearest  prototype  classifier 
and  the  crisp  nearest  prototype  classifier  produced 
equivalent  results.  But.  by  looking  at  the  memberships  of 
the  mi sclass i f i ed  samples  in  terms  of  the  number  with 
membership  greater  than  0.7  in  the  wrong  class.  given  in 
Table  6.  it  is  clear  that  these  memberships  do  provide  a 
useful  measure  of  level  of  confidence  of  classification. 
Further.  the  number  of  correctly  classified  samples  with 
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memberships  in  the  range  between  0.5  and  0.7  is  small 
compared  to  the  number  of  correctly  classified  samples. 
This  means  that  most  of  the  correctly  classified  samples 
have  membership  in  the  correct  class  greater  than  0.7. 
Thus*  we  can  be  assured  based  on  the  memberships  assigned 
that  the  samples  are  correctly  classified. 


4.4.4  Nearest  Prototype  Computational  Requirements 


The  computational  requirements  of  the  two  classifiers  are 
examined  below.  The  number  of  multiplications  and  additions 
required  for  classification  of  a  sample  is  given  in  terms  of 
the  parameters  defined  below. 


C  =  number  of  classes 


n  =  number  of  features  per  vector 


Fuzzy  Crisp 


Multipl ications 


C( 2+n ) 


Add i t i ons 


C ( 2n+  1 ) 


2Cn 
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As  with  the  previous  comparisons,  a  particular  example  is 
given  using  the  IRIS  data. 


n  =  4 


Fuzzy 

Crisp 

Mult i pi i cat i ons 

18 

12 

Add i t i ons 

27 

24 

As  with  the  nearest  neighbor  classifiers,  there  is  little 
difference  in  the  computat i onal  requirements  of  the  crisp 
and  fuzzy  1-nearest  prototype  classifiers. 


4.5  Conclusions 


I 


The  fuzzy  K-means  algorithm  considered  is  a  viable 


alternat i ve 


for  use 


in  clustering  problems 


While 
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considerable  research  concerning  this  algorithm  has  already 
been  conducted,  the  role  of  the  weighting  factor  has  not 
been  investigated  sufficiently.  The  results  reported  above 
indicate  that  the  effect  of  using  values  higher  than  two  for 
the  weighting  factor  deserves  further  investigation. 

The  fuzzy  K-nearest  neighbor  and  fuzzy  I-nearest 
prototype  algorithms  developed  and  investigated  in  this 
report  show  useful  results.  In  particular,  concerning  the 
fuzzy  K-nearest  neighbor  algorithm  with  fuzzy  k-nearest 
neighbor  labelled  sample  membership  assignments,  the 
membership  assignments  produced  for  classified  samples  tend 
to  possess  desirable  qualities.  That  is,  an  incorrectly 
classified  sample  will  not  have  a  membership  in  any  class 
close  to  one  while  a  correctly  classified  sample  does 
possess  a  membership  in  the  correct  class  close  to  one.  The 
fuzzy  1-nearest  prototype  classifier,  while  not  producing 
error  rates  as  low  as  the  fuzzy  nearest  neighbor  classifier, 
also  seems  to  produce  membership  assignments  which  are 
des i rable. 

Clearly,  the  results  reported  herein  indicate  that  the 
fuzzy  pattern  recognition  algorithms  considered  in  this 
research  are  useful  and  should  be  further  investigated. 
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Table  1 

CLUSTERING  RESULTS  -  IRIS  data 

Four  Features 

Fuzzy  K-means  Crisp  K-means 

12  3  12  3 

1  50  0  0  100%  1  50  0  0  100X 

2  0  47  3  94%  2  0  48  2  96% 

3  0  13  37  74%  3  0  14  36  72X 

Overall  Correct  rate  89.3%  Overall  correct  rate  89.3% 
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Table  la 

CLUSTERING  RESULTS  -  TNOCLASS  data 

Four  Features 

Fuzzy  K-means  Crisp  K-means 

12  12 

1  114  7  94.  255  1  114  7  9  4  .  255 

2  15  106  87.  655  2  15  106  87.  65: 

Overall  correct  rate  90.  955  Overall  correct  rate  90.955 

Features  Three  and  Four 
Fuzzy  K-means  Crisp  K-means 

12  12 

1  114  7  9  4  .  255  1  114  7  94.  2X 

2  15  106  87.  65:  2  15  106  87.655 

Overall  correct  rate  90.  955  Overall  correct  rate  90.  9X 
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Table  lb 


CLUSTERING  RESULTS  -  LANDSAT  datatfour  features) 


Furry  K-means 


1 

2 

3 

4 

1 

11151 

2951 

2728 

193 

65.  5% 

2 

3178 

5054 

723 

11 

56.  4% 

3 

404 

804 

3429 

14 

73.  3% 

4 

157 

43 

413 

762 

55.  4% 

Overall  correct 

rate  63. 

7% 

Cr  i  sp 

K-means 

1 

2 

3 

4 

1 

14267 

1463 

315 

978 

83.  3% 

2 

5904 

1575 

1309 

178 

17.  5% 

3 

2370 

440 

1481 

363 

»-• 

09 

4 

1254 

30 

38 

53 

0.  3% 

Overall  correct  rate  54.3% 


SEARCH 

12  3  4 


1 

9143 

1165 

230 

3 

87% 

2 

2088 

3503 

59 

0 

62% 

3 

380 

1158 

1409 

0 

48% 

4 

38 

155 

267 

368 

44% 

Overall  correct  i 

rate 

72% 
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Table  2 


Result  of  Varying  the  Weighting  FactorCm) 


Vectors  Number  of  vector’s  membership  in  range 

misclassif ied  (in  class  of  maximum  membership) 


> 

0.  6 

> 

0.  7 

> 

0.8 

>  0. 

9 

m 

T 

I 

T 

I 

T 

I 

T 

I 

T 

I 

1.4 

22 

17 

18 

15 

13 

14 

8 

12 

4 

7 

1.5 

22 

17 

17 

15 

11 

12 

5 

9 

2 

5 

1.6 

22 

17 

17 

14 

9 

12 

4 

6 

1 

1 

1.7 

22 

17 

16 

14 

7 

10 

2 

5 

1 

0 

1.8 

22 

17 

16 

13 

5 

8 

2 

2 

0 

0 

1.9 

22 

16 

14 

13 

4 

5 

1 

0 

0 

0 

2.  0 

22 

16 

10 

8 

4 

5 

1 

0 

0 

0 

2.  1 

22 

16 

9 

8 

2 

2 

1 

0 

0 

0 

2.  2 

22 

16 

9 

8 

2 

0 

0 

0 

0 

0 

2.  3 

22 

16 

8 

6 

2 

0 

0 

0 

0 

0 

2.4 

21 

15 

7 

6 

2 

0 

0 

0 

0 

0 

2.5 

21 

15 

7 

4 

1 

0 

0 

0 

0 

0 

2.6 

21 

15 

6 

3 

1 

0 

0 

0 

0 

0 

2.  7 

21 

15 

4 

1 

1 

0 

0 

0 

0 

0 

2.8 

21 

15 

4 

0 

1 

0 

0 

0 

0 

0 

2.  9 

20 

15 

4 

0 

0 

0 

0 

0 

0 

0 

3.  0 

20 

15 

4 

0 

0 

0 

0 

0 

0 

0 

Abbrev i at i ons 


T-TWOCIASS  data 


I- IRIS  data 
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Table  2a 

Result  of  Varying  the  Weighting  Factor(m) 

Vectors  Number  of  vector’s  membership  in  range 

mi  sc  lass i f i ed  (in  class  of  maximum  membership) 

>0.6  >0.7  >0.8  >0.9 

T  I  T  I  T  I  T  I  T  I 

20  15  30  00  00  00 

20  14  10  00  00  00 

19  13  00  00  00  00 

19  13  00  00  00  00 

19  13  00  00  00  00 

19  12  00  00  00  00 

19  11  00  00  00  00 

19  11  00  00  00  00 

Abbreviations:  T-TWDCIASS  data  I-IRIS  data 
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Table  3 


Results  of  K-nearest  Neighbor  Classifiers 


Number  of  M i sc  lass i f i ed  vectors 


Crisp 

Fuzzy- 

( 1) 

Fuzzy- 

(2) 

Fuzzy- 

(3) 

K 

I 

T 

I' 

I 

T 

I* 

T 

I' 

I 

T 

I 

1 

6 

26 

6 

6 

26 

6 

26 

6 

6 

26 

6 

2 

7 

26 

7 

6 

26 

6 

21 

6 

6 

21 

6 

3 

6 

21 

6 

6 

22 

6 

21 

7 

5 

19 

6 

A 

5 

20 

5 

6 

19 

6 

20 

7 

5 

20 

5 

5 

5 

20 

5 

5 

21 

5 

20 

7 

4 

19 

4 

6 

6 

19 

6 

5 

18 

5 

20 

6 

4 

20 

4 

7 

5 

19 

5 

5 

21 

5 

18 

6 

4 

19 

4 

8 

7 

21 

7 

6 

18 

6 

20 

6 

4 

20 

4 

9 

6 

21 

6 

4 

21 

4 

18 

5 

4 

18 

4 

Notat i on 

and 

Abbrev i at i ons 

:  K- 

number 

of  ne  i 

ghbors 

used 

I-IRIS  data(four  features) 
T-TWOCLASS  data  (  four  features) 
I’-IRIS23  dataCfour  features) 

(1) -crisp  initialization 

( 2 ) - e xponent i a  1  initialization 

(3) -fuzzy  3-nearest  neighbor 

initialization 
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Table  A 

Results  of  Fuzzy  K-nearest  neighbor  classifier, 
with  fuzzy  KINIT-nearest  neighbor  initialization 


KINIT 

1  3  5  7  9 


K 

I 

T 

I 

T 

I 

T 

I 

T 

I 

T 

1 

6-3 

26-15 

6-4 

26-17 

6-4 

26-  18 

6-5 

26-18 

6-5 

26-18 

2 

6-4 

2  3-17 

6-4 

21-13 

6-4 

23-14 

6-4 

22-13 

6-4 

22-11 

3 

5-4 

20-12 

5-4 

19-12 

5-4 

21-12 

5-5 

21-10 

6-5 

23-10 

4 

5-4 

17-12 

5-4 

20-11 

5-4 

19-10 

5-4 

19-10 

5-4 

19-9 

5 

4-4 

16-11 

4-4 

19-11 

5-4 

19-10 

5-3 

20-11 

5-3 

19-10 

6 

4-4 

20-10 

4-4 

20-11 

4-4 

20-11 

4-3 

21-9 

4-3 

20-8 

7 

4-3 

17-9 

4-4 

19-10 

4-3 

20-9 

4-3 

20-8 

4-3 

20-8 

8 

4-3 

17-9 

4-3 

20-9 

4-2 

20-9 

4-2 

20-8 

4-2 

20-8 

9 

4-3 

18-8 

4-3 

18-8 

4-2 

21-8 

4-2 

21-9 

4-2 

21-8 

Abbreviations:  I  -  IRIS  dataCfour  features) 

T  -  TWOCLASS  datatfour  features) 


Note  Is  Columns  give  results  for  the  values  of  KINITtthe 
K  used  to  initialize  the  labelled  samples 
memberships)  shown.  Rows  give  results  for  values 
of  K  in  the  K-nearest  neighbor  algorithm 

Note  2:  Table  entries  are  interpreted  as:  X-Y  indicates 
X  m i sc  lass i f i ed  vectors  with  Y  of  the  X  given 
membership  in  the  wrong  class  greater  than  O.7., 
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Table  5 


Results  of  the  1-Nearest  Prototype  Classifier 


IRIS  data 


Four  Features 


Cr  i  sp 


Fuzzy 


1  50 


1  50 


Features  Three  and  Four 


Cr  i  sp 


Fuzzy 


1  50 


1  50 


TWOCLASS  data 


Four  Features 


Features  Three  and  Four 


Crisp 


Fuzzy 


Crisp 


Fuzzy 


1  113 


1113  8 


1113  8 


1113  8 


2  12  109 


2  12  109 


2  12  109 


2  12  109 
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Table  6 


Fuzzy  Classifier  Membership  Assignments 


IRIS 

data 

TNOCLASS  data 

A 

B 

A 

B 

Mi  sc  lass i f i ed  samples  with 
membership  assigned  >0.7 

1 

1 

3 

3 

Samples  with  membership 
assigned  >  0.5  and  <0.7 

15 

15 

36 

36 

Abbreviations:  A  -  Four  feature  used 

B  -  Features  three  and  four  used 

Note:  The  first  row  in  the  table  above  gives  the 
number  of  mi  sc  lass i f i ed  vectors  in  the 
indicated  range  and  the  second  row  gives 
the  number  of  all  classified  samples  in  the 
given  range.  The  intent  is  to  illustrate 
that  very  few  samples  are  mi  sc  lass i f i ed 
with  high  membership,  while  very  few 
correctly  classified  samples  are  given 
membership  in  their  class  in  the  "fuzzy" 
region  between  0.5  and  0.7. 
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QOPTIMIZE 

A  DRIVER  ROUTINE  WHICH  ALLOWS  SELECTION  OF  ONE  OF  THREE 
DATA  SETS,  AND  THEN  SELECTION  OF  ONE  OF  SEVEN  PATTERN 
RECOGNITION  ALGORITHMS  TO  BE  RAN  ON  THE  DATA  SET 
CHOSEN.  AFTER  THE  CHOSEN  ALGORITHM  HAS  COMPLETED 
THE  USER  IS  PROMPTED  FOR  CHOICES  OF  INTERPRETIVE 
ALGORITHMS  TO  RUN  USING  RESULTS  PRODUCED  BY 
THE  PATTERN  RECOGNITION  ALGORITHM  CHOSEN. 

WRITTEN  BY:  MICHAEL  R.  GRAY 

COMPLETED:  JUNE  84 

FILENAME:  MGFUZZY. FTN 

CALLING  SEQUENCE:  RUN  MGFUZZY 

DATA  FILES  AVAILABLE: 

IRIS.DAT  -  150  SAMPLES  WITH  FOUR  FEATURES  PER  SAMPLE, 

50  SAMPLES  PER  CLASS 

TWOCLASS.DAT  -  242  SAMPLES  WITH  FOUR  FEATURES  PER 
SAMPLE,  141  SAMPLES  PER  CLASS 

IRIS23.DAT  -  100  SAMPLES  WITH  FOUR  FEATURES  PER  SAMPLE, 
50  SAMPLES  PER  CLASSC  A  SUBSET  OF  IRIS.DAT) 

PATTERN  RECOGNITON  SUBROUTINES  AVAILABLE  (  BY  TYPE  ): 

DESCRIPTION  FILENAME 


1 

2 

3 

4 

5 


6 

7 


CRISP  K-MEANS 
FUZZY  K-MEANS 
FUZZY  K-NEAREST  NEIGHBOR 
FUZZY  1-NEAREST  PROTOTYPE 
FUZZY  K-NEAREST  NEIGHBOR} 

FUZZY  INITIALIZATION 
A  -  VIA  "FZIFY"  ; 

DEVELOPED  BY  D.  HUNT 
B  -  VIA  "FZFYNN"; 

A  NEAREST  NEIGHBOR  TECHNIQUE 
CRISP  K-NEAREST  NEIGHBOR 
CRISP  1-NEAREST  NEIGHBOR 


CRSKMEAN.FTN 
FUZKMEAN . FTN 
FUZNEARN . FTN 
FUZPROTO.FTN 
FUZNEARN. FTN 

MGFZIFY. FTN 

MGFZFYNN. FTN 

CRSIPNN. FTN 
CRISPNP . FTN 


NOTE:  CLASSIFIER  ALGORITHMS  (3  THROUGH  7)  USE 
THE  "LEAVE  ONE  OUT"  METHOD  TO  PRODUCE  A 
CLASSIFICATION.  THE  METHOD  IS  SIMPLY  TO 
LEAVE  THE  CURRENT  SAMPLE  BEING  CLASSIFIED 
OUT  OF  THE  LABELLED  SET  USED  TO 
DETERMINE  CLASSIFICATION. 


INTERPRETATIVE  SUBROUTINES  AVAILABLE: 

DESCRIPTION  FILENAME 


1 


2 


3 


COMPUTE  HARD  PARTITION  USING  MGCMTRIX. FTN 

MAXIMUM  CLASS  MEMBERSHIP, 

RESULT  IS  A  CONFUSION  MATRIX 

OUTPUT  THE  MEMBERSHIP  FUNCTIONS,  MGMEMBPR. FTN 

A  -  FOR  ALL  SAMPLES  OF  DATA  5ET 
B  -  FOR  ONLY  THE  MISCLASSIFIED 
SAMPLES  OF  DATA  SET 

COMPUTE  LEVEL  SETS  USING  ALPHA  MGCUTSET.FTN 

AND  BETA  AS  UPPER  AND  LOWER 
CUTOFFS,  RESPECTIVELY.  BASICALLY, 

A  LEVEL  SET  IS  DEFINED  TO  INCLUDE 
THOSE  SAMPLES  WHICH  HAVE  MEMBERSHIP 
IN  THE  DESIRED  RANGE,  EITHER  GREATER 
THAN  ALPHA,  LESS  THAN  BEAT,  OR  IN- 
BETWEEN  ALPHA  AND  BETA 
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OUTPUT  A  2-D  PLOT  OF  RESULTS, 


A  - 


C  - 


RESULTANT  CLASSIFICATION  OF 
EACH  SAMPLE  BY  CLASS 
MEMBERSHIP  FUNCTIONS  ASSIGNED 
IN  ONE  CLASS 

SAMPLES  WHICH  ARE  A  LEVEL  SET 
FOR  A  GIVEN  CLASS 


MGDOPLOT. FTN 


C 
C 
C 

c 
c 
c 
c 
c 

cccccccccccccccccccccccccccccccccccccccccccccccccccccccccccc 

c 

C  LOGICAL  UNITS  USED: 

C 
C 

c 
c 
c 
c 
c 
c 
c 
c 
c 
c 
c 
c 

cccccccccccccccccccccccccccccccccccccccccccccccccccccccccccc 

c 

FINAL  NOTE:  USER  INTERACTION  IS  REQUIRED  AT 

VARIOUS  INTERVALS.  AFTER  A  PARTICULAR 
PATTERN  RECOGNITION  ALGORITHM  IS  CHOSEN 
NO  FURTHER  INTERACTION  IS  REQUIRED  OTHER 
THAN  TO  CHOOSE  THE  VALUE  FOR  "K"  IF  A 
NEAREST  NEIGHBOR  ALGORITHM,  OR  CHOOSE  THE 
TYPE  OF  "FUZZY"  INITIALIZATION  IF  SO  CHOSEN 
UNTIL  THE  ALGORITHM  HAS  COMPLETED  AND 
INTERPRETIVE  OPTIONS  BECOME  AVAILABLE. 


11  -  TO  ACCESS  A  DATA  FILE,  OPEN  ONLY  WHEN 
ACCESSING  THE  DATA  SET  AND  THEN  NO 
LONGER  USED  UNTIL  ANOTHER  DATA  FILE 
IS  SELECTED 

5  -  USED  TO  READ  AND  WRITE  TO  CONSOLE, 

ALWAYS  ASSIGNED 

6  -  USED  TO  WRITE  TO  HARD  COPY  PRINTER, 

ALWAYS  ASSIGNED,  THOUGH  ONLY  USED 
WHEN  USER  SELECTS  THE  PRINTER  AS 
THE  OUTPUT  DEVICE 


C 
C 
C 
C 
C 
C 
C 
C 
C 
C 

CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC 

c 

c 


LOGICAL  DONE 

CHARACTERS  REPLY 

CHARACTERX15  VFORMT , DFILE 

REAL  MFUNCT(3 » 242) ,PR0T0!4,3),X!4,242) 

REAL  NMFUNC(3»242) , LOW 

INTEGER  VECSIZ,VCOUNT,FFEAT, CLASS, VECTOR, CHOICE 
INTEGER  FETUR , CHSIZE, VCL ASS( 3 )»TEST1»START!3) 
INTEGER  ACOUNT (3),BC0UNT(3), I COUNT < 3) , OPTION, POINTR 
INTEGER  WRNCNT , KFINIT , FUZCH , VECREC , COUNT , END( 3 ) 
INTEGERS  K ALPHA (3, 24 2) ,KBETA!3,242) ,BETWEN(3,242) 
INTEGERS  WRONGC242) 

COMMON  /AREA1/X  /AREA2/PROTO  /AREA3/MFUNCT 
COMMON  /AREA4/WRNCNT, WRONG 
COMMON  /AREA5/NMFUNC  /AREA6/START , END 
COMMON  /AREA8/AC0UNT ,BCOUNT , ICOUNT ,KALPHA, 

1  KBETA.BETWEN 

C 

C  xxx  EXECUTION  BEGINS  xxx 
C 

C  LOOPCUntil  user  is  finished) 

1  CONTINUE 

C 

C  FIND  OUT  WHICH  DATA  SET  TO  USE 
C 


WRITE(5,2) 

2  FORMAT!//, '  Enter  code  for  data  sat  to  use:’, 

1  //» 5X, ’ 1-IRIS  data-150  vectors:  3  classes-’ 

2  ’4  features’, //, 5X, ’ 2-TWOCLASS  data-242’ 

3  ’  vectors:  2  classes-4  features’ ,//5X, 

4  ’3-IRIS23  data-100  vectors:  2  classes-’ 

5  ’4  features’ ,/,7X, ’ (Thi s  file  contains* 

6  ’  classes  two  and  three  of  IRIS)’) 

READ(5,6)  CHOICE 
FORMAT!  ID 


6 
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C 

C 

C  DO  CASE(CHOICE) 

GO  TO  (10,15.20)  CHOICE 
C 

C  CASE  «1 

10  CONTINUE 

DFILE=' IRIS. DAT' 

VFORMT=’ (20F3 . 1 ) ' 

GO  TO  30 

C  END  CASE  «1 

C 

C  CASE  *2 

15  CONTINUE 

DFILE=*THOCLASS.DAT' 

VFORMT=' (4F10 . 6) ' 

GO  TO  30 

C  END  CASE  «2 

C 

C  CASE  «3 

20  CONTINUE 

DFILE=’ IRIS23.DAT' 

VFORMT=’ ( 20F3 . 1 ) ’ 

GO  TO  30 

C  END  CASE  «3 

C 

30  CONTINUE 

C  END  DO  CASE 

C 

C  OPEN  DATA  FILE  AND  READ  IN  NUMBER  OF  CLASSES, 

C  SIZE  OF  DATA  VECTORS,  NUMBER  OF  VECTORS  PER 
C  RECORD,  AND  NUMBER  OF  VECTORS  PER  CLASS. 

C 

OPEN(ll , FILE=DFILE) 

C 

READdl  ,31)  KLASES,  VECSIZ,  VECREC 

31  F0RMAT(3I3) 

READdl, 31)  (VCLASSd).  1=1, KLASES) 

VCOUNT=0 

C 

C  DO  UNTIL (REMAINING  DATA  SET  DEPENDENT 

C  VARIABLES  INITIALIZED) 

DO  32  CL ASS=1, KLASES 
C 

VCOUNT=VCOUNT+VCLASS(CLASS) 

END( CLASS )=VCOUNT 

START (CLASS)=END( CL ASS )-VCLASS(CLASS)+l 
C 

32  CONTINUE 

C  END  DO  UNTIL 

C 

C  READ  DATA  VECTORS  FROM  DISK  FILE, 

C  THEN  CLOSE  THE  DATA  FILE 

C 

C  DO  UNTIL (DATA  VECTORS  READ) 

DO  33  1=1, VCOUNT, VECREC 

READdl,  VFORMT)  (  (X(J ,  L  ),  J=1  .VECSIZ)  , 

1  L=I , I+VECREC-1 ) 

33  CONTINUE 

C  END  DO  UNTIL 

C 

CLOSEdl) 

C 

C  LOOP(UNTIL  ANOTHER  DATA  SET  IS  DESIRED) 

34  CONTINUE 
C 

C 

C  LET  USER  KNOW  HOW  MANY  FEATURES  ARE  AVAILABLE 
C 

WRITE(5,35)  VECSIZ 

35  FORMAT!/, 4X» ’Thera  are', 12,*  features  in  each  * 
1  ’vector.') 

C 
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C  SET  FIRST  FEATURE  AND  LAST  FEATURE  TO  BE  CONSIDERED 
C  WHEN  COMPUTING  MEMBERSHIP  FUNCTION  ARRAY. 

WRITE(5,40) 

40  FORMAT!//,’  Input  first  feature  number  in  vectors’ 
1  'to  use(Il).') 

READ(5,x)  FFEAT 
WRITEt  5 *  45 ) 

45  FORMAT(//,’  Input  last  feature  number  in  vectors’ 

1  ’to  use(Il) . ’ ) 

READ(5,x)  LFEAT 
C 

C  INPUT  VALUE  OF  "FUZZIFIER"  TO  BE  USED  IN  ALGORITHM 
C 

WRITEC 5 » 50 ) 

50  FORMAT! ’1’,’  Input  value  of  weighting  factor’ 

1  ’("FUZZIFIER"):’) 

READ(5,55)  FZFIER 
55  FORMAT (F3.1) 

C 

C  TEST  IF  VALUE  OF  "FUZZIFIER"  IN  PROPER  RANGE 
C  (ONLY  ALLOW  >  1.3) 

C 

IF(. NOT. (FZFIER. LT.1.3))GO  TO  60 
FZFIER=1 . 3 
C  END  IF 

60  CONTINUE 

C 

C  SET  MAXIMUM  MEMBERSHIP  UPDATE  ERROR  ALLOWED 
C  IN  FUZZY  K-MEANS  ALGORITHM 
C 

EPSLON=0. 001 
C 

C  FIND  OUT  WHAT  TO  DO 

C 

WRITE(5,65) 

65  FORMAT(//,’  Enter  code  for  your  choice: ’* 

1  //,5X,’l  -  crisp  K-means’, 

2  //,5X,’2  -  fuzzy  K-means', 

3  //,5X,’3  -  fuzzy  K-nearest  neighbor** 

4  //,5X,’4  -  fuzzy  1-nearast  prototype’* 

5  //,5X,’5  -  fuzzy  K-nearest  neighbor*’ 

6’  fuzzy  initialization’* 

7  //,5X*’6  -  crisp  K-nearest  neighbor’ 

8  //,5X, *7  -  crisp  1-nearest  prototype’) 

READ(5*6)  CHOICE 

C 

GO  TO  (90,95*100,100,100,100,100)  CHOICE 
C 

C  CASE  #1 

90  CONTINUE 

C 

C  INITIALIZE  MEMBERSHIP  FUNCTIONS, 
c  THEN  RUN  CRISP  K-MEANS  ALGORITHM 
C 

HIGH=1 . 0 
LOW=0 . 0 

CALL  INITMF(VCOUNT, FFEAT, LFEAT, KLASES, HIGH, LOW) 
CALL  CRMEAN(KLASES,VCOUNT, FFEAT, LFEAT, EPSLON, 

1  ITERAT) 

GO  TO  400 

C  END  CASE  «1 

C 

C  CASE  12 

95  CONTINUE 

C 

HIGH=0 . 98 
LOW=0 . 02 

CALL  INITMF(VCOUNT, FFEAT, LFEAT, KLASES, HIGH, LOW) 
CALL  FKMEAN( FZFIER, KLASES, FFEAT, LFEAT, VCOUNT, 

1  EPSLON, ITERAT) 

GO  TO  400 

C  END  CASE  «2 

C 


■V 
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CASES  13,4,506 
)  CONTINUE 

IF( ( CHOICE. EQ . 3) .OR. ( CHOICE. EQ  . 5) . OR . 

1  (CHOICE. EQ. 6))  THEN 

URITE(5, 105) 

i  FORMAT(/, ’  Input  number  of  neighbors  used  to  ' 

1  'assign  membership  function  vaiues(<10) : ■ ) 
READC5, 6 )  K 
IFCCHOICE. EQ.5)  THEN 
WRITE(5,106) 

>  FORMATC/, '  Input  number  of  neighbors  used  to' 

1  '  fuzzify  memberships  of  labelled  set(<10):’) 

READC5 , 6 )  KFINIT 
URITEC5 , 107 ) 

1  F0RMAT</, '  Enter  choice  of  fuzzifying: ’ 

1  *  1  -  Fuzzifying  per  nearest  neighbor(s)’ 

2  ’technique’,//,’  2  -  Fuzzifying  per  D.’ 

3  ’HUNT  techniqueCtuo  class  sets  only)’) 
READ( 5,6)  FUZCH 

END  IF 
END  IF 

IF( (CHOICE. NE.4). AND. (CH0ICE.NE.7))  THEN 

INITIALIZE  THE  MEMBERSHIP  FUNCTIONS 

DO  UNTIKALL  MEMBERSHIP  FUNCTIONS  SET  TO  ZERO) 
DO  115  CLASS=1,KLASES 
DO  110  VECT0R=1 , VC0UNT 
MFUNCT (CL ASS, VECTOR )=0 . 0 
I  CONTINUE 

i  CONTINUE 

END  DO  UNTIL 

DO  UNTIL(ALL  DATA  ASSIGNED  COMPLETE 
MEMBERSHIP  IN  THEIR  CLASS) 

DO  125  CLASS=1 , KLASES 

DO  120  VECTOR=START(CLASS) » END(CLASS) 

MFUNCT (CLASS, VECTOR )=1 . 0 
)  CONTINUE 

>  CONTINUE 
END  DO  UNTIL 

END  IF 

SET  FIRST  VECTOR  IN  DATA  SET  AS  THE 
FIRST  TEST  VECTOR 

DO  UNTIL(TRAINING  PROCEDURE  COMPLETED) 

DO  300  TEST1=1,VC0UNT 

IF( (CHOICE. EQ. 3). OR. (CHOICE. EQ.5))  THEN 
IF( CHOICE .EQ.5)  THEN 
IF( FUZCH . EQ . 2 )  THEN 

CALL  FUZFY( FZFI ER , FFEAT , L FEAT , VCOUNT ) 

ELSE 

CALL  FZFYNN(FZFIER, KLASES, VCOUNT, FFEAT, 

1  LFEAT, KFINIT, TEST1) 

END  IF 
END  IF 

CALL  FUZNN(FZFIER, KLASES, FFEAT, LFEAT, VCOUNT, 

1  K,TEST1) 

ELSE  I F( (CHOICE. EQ . 4 ) .OR . (CHOICE . EQ . 7 ) )  THEN 

DO  UNTIL (PROTOTYPES  FOUND  FOR  ALL  CLASSES) 

DO  230  CLASS=1, KLASES 

DO  UNTIL (PROTOTYPE  VECTOR  FOR  CURRENT 
CLASS  ZEROED) 

DO  205  FETUR=FFEAT, LFEAT 
PROTO(FETUR,CLASS)=0.0 
5  CONTINUE 

END  DO  UNTIL 
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DO  UNTIUPROTOTYPE  FOR  CURRENT  CLASS  SUMMED) 
DO  220  VECTOR=START(CLASS) , ENDC CLASS) 

IF(V ECTOR. NE.TEST1)  THEN 
DO  210  FETUR=FFEAT .LFEAT 

PROTO(FETUR.CLASS)=PROTO(FETUR, CLASS) 

1  +X(FETUR, VECTOR) 

I  CONTINUE 

END  IF 

I  CONTINUE 

END  DO  UNTIL 

IF((TEST1.GE. START (CLASS)). AND. 

1  (TESTl.LE.END(CLASS)))  THEN 

COUNT =VCL ASS ( CLASS )-l 
ELSE 

COUNT=VCLASS(CLASS) 

END  IF 

DO  UNTIL (PROTOTYPE  DIVIDED  BY  COUNT 
OF  TRAINING  SET) 

DO  225  FETUR=FFEAT, LFEAT 

PROTO(FETUR, CL ASS)=PROTO(FETUR, CLASS)/ 

1  COUNT 

i  CONTINUE 

END  DO  UNTIL 

)  CONTINUE 

END  DO  UNTIL 

IFCCH0ICE.EQ.4)  THEN 

CALL  FPROTO(FZFIER, KLASES, FFEAT, LFEAT, TEST1) 
ELSE 

CALL  CRSPNP ( KL ASES , FFEAT , L FEAT , TEST1 ) 

END  IF 

ELSE  IF(CHOICE.EQ.6)  THEN 

CALL  CRSPNN(KLASES, FFEAT, LFEAT, K, VCOUNT, TEST1 ) 

END  IF 

)  CONTINUE 

END  DO  UNTIL 
GO  TO  400 

END  CASE  I  3,  4,  5,  6  t  7 

I  CONTINUE 
END  DO  CASE 

#XX  OUTPUT  AND  INTERPRET  RESULTS  *** 

WRITE(5,500) 

)  FORMAT!//,'  Where  Mould  you  like  results  sent?' 

1’  Enter  choice:’ > 

2  //, 5X, '5  -  CONSOLE’, 

3  //, 5X, ’ 6  -  PRINTER’) 

READ(5,506)  LU 

l  FORMAT  (ID 

WRITE(LU,507)  DFILE,K,KFINIT 
r  FORMATC  ’,’  Data  set:  *,A20,’  K=’ , 13, ’KFINIT=’ , 13) 

I F( (CHOICE . E9 . 3) . OR . (CHOICE . EQ . 5) . OR . 

1  (CHOICE. EQ. 6))  THEN 

URITE(LU,508)  K 

i  FORMAT! ’  ’,’  Number  of  neighbors  used  =  ’,13) 

IF( (CHOICE. EQ. 5). AND. (FUZCH. EQ . 1 ) )  THEN 
WRITE(LU,509)  KFINIT 

)  FORMAT!*  ’,’ Number  of  neighbors  used 

l ’for  initialization  =  ’,13) 

END  IF 
END  IF 
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I F( (CHOICE. EQ. 1) .OR. (CHOICE. EQ . 2) )  THEN 
C 

C  OUTPUT  "EPSLON",  AND  NUMBER  OP  ITERATIONS  REQUIRED 
C 

WRITE(LU,602)  EPSLON, ITERAT 

602  FORMATC  EPSLON  (maximum  update  error  allowed’ 

1  )  =  ',F7.5,'  .’,13,’  iterations  required.’) 

C 

END  IF 
C 

IF( (CHOICE. GE. 2). AND. (CHOICE. LE. 5))  THEN 
C 

C  OUTPUT  FUZZIFIER 
C 

MRITE(LU>603)  FZFIER 

603  FORMATC  ’,’  The  weighting  factorCFUZZIFIER")  is:’ 

1  »F6.3» ’  .’) 

C 

END  IF 
C 

IF( (CHOICE. NE. 3). AND. (CHOICE. NE. 5). AND. 

1  (CHOICE. NE. 6))  THEN 

C 

C  OUTPUT  THE  FINAL  CLUSTER  CENTERS 
C 

DO  615  INDEX=1 »KLASES 

WRITEC LU , 6 12 )  INDEX, (PROTO( I , INDEX) , 

1  I=FFEAT,LFEAT) 

612  FORMAT(/, IX, ’Weighted  mean  for  class  «’, 

1  12,’  : ' , 12F6 . 3) 

615  CONTINUE 
C 

END  IF 
C 

C  OUTPUT  A  CONFUSION  MATRIX  AND  FIND  THE  INDICES 
C  OF  MISCLASSIFIED  VECTORS 
C 

CALL  CMTRIX(KLASES, VCOUNT , LU, CHOICE) 

C 

IF( (CHOICE. GE. 2). AND. (CHOICE. LE. 5))  THEN 
C 

C  OUTPUT  THE  MEMBERSHIP  FUNCTION  ARRAY  COMPUTED 
C 

WRITE(5,669) 

669  FORMAT(/, ’  Enter  option:’,//, 

1  5X,’l  -  Output  entire  membership  function  ' 

2  ’array’ ,//,5X, '2  -  Output  only  the  mi sclassi f i ed’ 

3  ’vector"s  membership  functions') 

READ(5,506)  OPTION 

C 

CALL  MEMBPR(KLASES, VCOUNT, LU, CHOICE, OPTION) 

C 

END  IF 
C 

C  LOOP(UNTIL  USER  FINISHED) 

670  CONTINUE 
C 

WRITE(5, 969) 

969  FORMAT(/» '  Do  you  want  to  find  the  ALPHA  and’ 
l’BETA  cutsets? (Y/N) ’ ) 

READC 5 , 996 )  REPLY 
IF(REPLY. EQ. ’ Y’ )  THEN 


C 

WRITE(5,971) 


971 

FORMATC  ', ’Input  "ALPHA",  upper  membership’ 

1 

'  cut-off:’) 

READ(5,972)  ALPHA 

972 

F0RMAT(F4.3) 

WRITEC5, 973) 

973 

FORMAT('  ’, ’Input  "BETA",  lower  non-membership' 

1 

’  cut-off:') 

READ(5, 972)  BETA 

C 
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CALL  CUTSET! ALPHA  *  BETA, KL AS ES*VCOUNT , CHOICE, LU) 
END  IF 

MRITE!5,  995) 

FORMAT!/, *  Do  you  want  a  plot  of  results?!Y/N) ' ) 
READ! 5 * 996 )  REPLY 
FORMAT! A1 > 

IF!REPLY. EQ. ’Y* )  THEN 

CALL  DOPLOTLKL ASES,VCOUNT, START, END, LFEAT, CHOICE) 
END  IF 

WRITE!5 , 997 ) 

FORMAT!/,'  Do  you  want  another  plot  or  cutset' 

1  ' ?!Y/N) ' ) 

READ! 5, 996)  REPLY 
IFIREPLY.EQ. *Y’ )G0  TO  670 
END  LOOP 

WRITE!5,998) 

FORMAT!//,'  Do  you  want  to  run  another  algorithm' 

1'  using  the  same  data  set?!Y/N)') 

READ! 5 , 996 )  REPLY 
IF!REPLY.EQ. 'Y’)GO  TO  34 
WRITE! 5, 999) 

FORMAT!//,'  Do  you  want  to  get  a  different' 

1  'data  set? !Y/N) ' ) 

READ!5, 996)  REPLY 
IF!REPLY.EQ.'Y')GO  TO  1 
END  LOOP 


oooooooooooooonooooooooooooooooooooonooooooooooooonoooooooooooooooooooooo 
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QOPTIMIZE 

A  SUBROUTINE  WHICH  IMPLEMENTS  THE  FUZZY  K-MEANS 
ALGORITHM!  ALSO  REFERRED  TO  AS  THE  FUZZY 
ISODATA  ALGORITHM). 

WRITTEN  BY:  MICHAEL  R.  GRAY 

COMPLETED:  MAR  1984 

FILENAME:  FUZKMEAN.FTN 

CALLING  SEQUENCE! FROM  A  FORTRAN  ROUTINE):  CALL 
FKMEAN(FZFIER,KLASES, FFEAT , LFEAT, VCOUNT , EPS LON, I TER AT) 

INPUT  VARIABLES!NOT  CHANGED): 


FZFIER 


REAL  VALUE  FOR  THE  WEIGHTING  FACTOR(M) 
USED  BY  FUZZY  K-MEANS 


KLASES  -  INTEGER  COUNT  OF  NUMBER  OF  CLUSTER  DESIRED 

FFEAT  -  INTEGER  WHICH  SETS  FIRST  FEATURE  IN 
DATA  SET  TO  CONSIDER 

LFEAT  -  INTEGER  WHICH  SETS  LAST  FEATURE  IN 
DATA  SET  TO  CONSIDER 

VCOUNT  -  INTEGER  COUNT  OF  SAMPLES  IN  DATA  SET 

EPSLON  -  REAL  VALUE  WHICH  SETS  THE  MAXIMUM  UPDATE 
ERROR  ALLOWED  IN  ANY  MEMBERSHIP 
ASSIGNMENT  BEFORE  COMPLETION 


X  -  REAL  ARRAY  4  BY  242  WHICH  HOLDS  THE  DATA  SAMPLES, 
PASSES  IN  LABELLED  FORTRAN  COMMON  "AREA1" 

OUTPUT  VARIABLES  PRODUCED: 

PROTO  -  REAL  ARRAY  4  BY  3  WHICH  HOLDS  THE  FUZZY 
CLUSTER  CENTERS  PRODUCED,  PASSED 
IN  LABELLED  FORTRAN  COMMOM  "AREA2" 

MFUNCT  -  REAL  ARRAY  3  BY  242  WHICH  HOLDS  THE 

MEMBERSHIP  FUNCTION  ASSIGNMENTS  PRODUCED, 
PASSED  IN  LABELLED  FORTRAN  COMMON  "AREA3" 


ITERAT  -  INTEGER  WHICH  HOLDS  THE  NUMBER  OF 
ITERATIONS  REQUIRED 

CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC 


CCCCC 


PSEUDO  -  CODE  SOLUTION 


ENTER  FUZKMEAN 

SET  MAXIMUM  NUMBER  OF  ITERATIONS  ALLOWED 
INITIALIZE  ITERATIONS  TO  0 

DO  UNTILLMEMBERSHIP  FUNCTIONS  ASSIGNED  STABILIZE 
(UPDATE  ERROR  OF  ANY  MEMBERSHIP 
ASSIGNMENT  <  EPSILON  OR  MAXIMUM 
NUMBER  OF  ITERATIONS  COMPLETE) 

COMPUTE  FUZZY  CLUSTER  CENTERS  BASED  ON 
MEMBERSHIP  ARRAY  INITIALIZATION 
SET  VECTOR  INDEX=1 
DO  UNTIL(ALL  DATA  VECTORS  ASSIGNED 
MEMBERSHIP  FUNCTIONS) 

COMPUTE  DISTANCES  FROM  EACH  FUZZY  CLUSTER 
CENTER  TO  CURRENT  DATA  VECTOR 
ASSIGN  DATA  VECTOR  MEMBERSHIPS  IN  ALL  CLASSES 
AS  A  FUNCTION  OF  DISTANCE  FROM  FUZZY  CLUSTER 
CENTER  OF  THE  CLASS,  KEEPING  TRACK  OF  MAXIMUM 
UPDATE  DIFFERENCE  FOR  ALL  MEMBERSHIP  ASSIGNMENTS 
INCREMENT  VECTOR  INDEX 
END  DO  UNTIL 

INCREMENT  ITERATION  COUNT 


ouuoo  o  o  o  ooo  oooooo-hoooo  o  o  moo  o  o  o  o  o  nootfooo  mionoo  o  o 


C  END  DO  UNTIL 
C  RETURN 
C 

cccccccccccccccccccccccccccccccccccccccccccccccccccccccccccc 

c 

SUBROUTINE  FKMEAN(  FZFIER ,KL ASES , FFEAT , LFEAT , 

1  VCOUNT , EPSLON , ITERAT ) 

C 

LOGICAL  MATCH 

REAL  MFUNCT(3,242),X(4,242),NEWMF,PR0T0(4,3),XDIST(3) 
INTEGER  FFEAT, LFEAT, CLASS, VECTOR, FETUR, VCOUNT 
COMMON  /AREA1/X  /AREA2/PR0T0  /AREA3/MFUNCT 

xxx  INITIALIZE  xxx 

SET  MAXIMUM  ITERATIONS 

MAXITR=50 

SET  ITERATION  TO  ZERO  INITIALLY 
ITERAT=0 

COMPUTE  POWER  WHICH  IS  A  FUNCTION  OF  THE  "FUZZIFIER" 

POWER=l . 0/( FZFI ER-1 . 0 ) 

XXX  BEGIN  ITERATIONS  xxx 

DO  UNTIUMAXIMUM  DIFFERENCE  IN  MEMBERSHIP 
FUNCTION  UPDATES  LESS  THAN  EPSLON 
OR  MAXIMUM  ITERATIONS  PERFORMED) 

CONTINUE 

COMPUTE  WEIGHTED  MEANS 

DO  UNTIL (WEIGHTED  MEAN  FOR  EACH  CLASS  COMPUTED) 

DO  7  CLASS=1,KLASES 

DO  UNTILCMEAN  VECTOR  «  "CLASS"  ZEROED) 

DO  2  FETUR=FFEAT, LFEAT 
PROTO( FETUR, CLASS)=0.0 
CONTINUE 
END  DO  UNTIL 

DENOM=0 . 0 

DO  UNTIL(ALL  VECTORS  OF  CLASS  "TCLASS"  INCLUDED) 

DO  4  VECTOR=l, VCOUNT 

FUZZED=MFUNCT ( CL ASS, VECTOR )xxFZFIER 
DENOM=DENOM+ FUZZED 

DO  UNTIKVECTOR  NUMBER  "VECTOR"  INCLUDED) 

DO  3  FETUR=FFEAT, LFEAT 

PROTO (FETUR, CL AS 5 )=PROTO( FETUR, CL ASS >+ 

1  X( FETUR, VECTOR)XFUZZED 

CONTINUE 
END  DO  UNTIL 

CONTINUE 
END  DO  UNTIL 

DO  UNTIL(MEAN  VECTOR  DIVIDED  BY  "DENOM") 

DO  6  FETUR=FFEAT, LFEAT 

PROTO( FETUR, CL ASS )=PROTO( FETUR, CLASS)/DENOM 
CONTINUE 

CONTINUE 
END  DO  UNTIL 

DIFMAX=0.0 

DO  UNTIL (MEMBERSHIP  FUNCTIONS  UPDATED  AND 


OOOOO  OOHO  OOOO  O  O  O  O  OO  O  O  o  o  >o  O  O  00000*000  o  ooo 
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MAXIMUM  OLD  TO  NEW  MEMBERSHIP 
FUNCTION  DIFFERENCE  FOUND  FOR 
ALL  MEMBERSHIP  FUNCTIONS) 

DO  14  VECT0R=1 , VCOUNT 

MATCH=. FALSE. 

DSUM=0.0 

CLA5S=1 

DO  UNTILCVECTOR  "VECTOR"  COMPARED  TO  ALL 
MEANS  OR  MATCH  FOUND) 

CONTINUE 

COMPUTE  DISTANCE  -  VECTOR  NUMBER  "VECTOR" 

TO  MEAN  NUMBER  "CLASS" 

XDIST(CLASS)=0.0 

DO  UNTIL(DISTANCE  SQUARED  COMPUTED) 

DO  9  FETUR=FFEAT , LFEAT 

TEMP=X(FETUR, VECTOR )-PROTO(FETUR, CL ASS) 
XDIST( CLASS )=XDIST(CLASS)+TEMPXTEMP 
CONTINUE 
END  DO  UNTIL 

IFCXDIST(CLASS).EQ.O.O)  THEN 
MATCH=. TRUE. 

MCLASS=CLASS 

ELSE 

XDIST(CLASS)=1 . 0/XDIST(CLASS)xxPOWER 
DSUM=DSUM+XDI ST (CLASS) 

END  IF 

CLASS=CLASS+1 

IF((. NOT. MATCH). AND. (CLASS. LE.KLASES) )G0  TO  8 
END  DO  UNTIL 

IF(. NOT. MATCH)  THEN 

DO  UNTIL(NEW  MEMBERSHIP  FUNCTIONS  ASSIGNED, 
AND  MAXIMUM  OLD  TO  NEW  MEMBERSHIP 
FUNCTION  DIFFERENCE  FOUND  FOR 
CURRENT  VECTOR) 

DO  10  CLASS=1 , KLASES 

NEWMF=XDIST (CL ASS )/DSUM 
DIFF=ABS(NEWMF-MFUNCT (CLASS, VECTOR)) 
MFUNCT (CLASS , VECTOR ) =NEWMF 

FIND  MAXIMUM  OF  ALL  DIFFERENCES  IN  OLD  TO 
TO  NEW  MEMBERSHIP  FUNCTIONS 

IF(DIFF.GT.DIFMAX)  THEN 
DIFMAX=DIFF 
END  IF 

0  CONTINUE 

END  DO  UNTIL 

ELSE 

DO  UNTIL(NEW  MEMBERSHIP  FUNCTIONS  ASSIGNED, 
AND  MAXIMUM  OLD  TO  NEW  MEMBERSHIP 
FUNCTION  DIFFERENCE  FOUND  FOR 
CURRENT  VECTOR) 

DO  11  CLASS=1 , KLASES 

I F( CL ASS . EQ .MCI ASS )  THEN 
NEWMF=1 . 0 
ELSE 

NEWMF=0.0 
END  IF 

DI FF=ABS(NEWMF-MFUNCT( CL ASS, VECTOR) ) 
MFUNCT (CL ASS, VECTOR )=NEWMF 
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C  FIND  MAXIMUM  OF  ALL  DIFFERENCES  IN  OLD 
TO  NEU  MEMBERSHIP  FUNCTIONS 

IFCDIFF.GT.DIFMAX)  THEN 
DIFMAX=DIFF 
END  IF 
C 

11  CONTINUE 

C  END  DO  UNTIL 

END  IF 
C 

14  CONTINUE 

C  END  DO  UNTIL 

C 

C  INCREMENT  ITERATION  COUNT 
C 

ITERAT=ITERAT+1 

C 

I F(< DIFMAX.GT.EPS LON) .AND. 

1  ( ITER AT . LE.MAXITR) )G0  TO  1 

C  END  DO  UNTIL 

C 

RETURN 


oooooooooooooooooooooooooonoonoonoonnooooonnooooooooonooooooooooonooooooo 


♦OPTIMIZE 

A  SUBROUTINE  WHICH  IMPLEMENTS  THE  HARD  K-MEANS 
ALGORITHM  USING  MEMBERSHIP  FUNCTIONSt  =0,1  ) 

INSTEAD  OF  CLUSTER  ASSIGNMENT. 

WRITTEN  BY:  MICHAEL  R.  GRAY 

COMPLETED:  MAY  19S4 

FILENAME:  CRSKMEAN.FTN 

CALLING  SEQUENCE(FROM  A  FORTRAN  ROUTINE):  CALL 
CKMEAN( KLASES , VCOUNT , FFEAT , L  FEAT , EPSLON , ITERAT ) 

INPUT  VARIABLESCNOT  CHANGED): 

KLASES  -  INTEGER  COUNT  OF  CLASSES, 

OR  CLUSTERS  TO  PRODUCE 

VCOUNT  -  INTEGER  COUNT  OF  SAMPLE  VECTOR  IN  DATA  SET 

FFEAT  -  INTEGER  WHICH  SETS  FIRST  FEATURE  IN  DATA 
VECTORS  TO  CONSIDER 

LFEAT  -  INTEGER  WHICH  SETS  LAST  FEATURE  IN  DATA 
VECTORS  TO  CONSIDER 

EPSLON  -  REAL  VALUE  WHICH  SETS  MAXIMUM  ERROR 
ALLOWED  IN  CLUSTER  UPDATE  BEFORE 
STOPPING  ITERATIONS 

X  -  REAL  ARRAY  4  BY  242  WHICH  HOLDS  THE  DATA  SAMPLES, 
PASSES  IN  LABELLED  FORTRAN  COMMON  BLOCK  "AREA1" 

OUTPUT  VARIABLES  PRODUCED: 

PROTO  -  REAL  ARRAY  4  BY  3  WHICH  HOLDS  THE  CLUSTER 
CENTERS  PRODUCED,  PASSED  IN  LABELLED 
FORTRAN  COMMON  BLOCK  "AREA2" 

MFUNCT  -  REAL  ARRAY  3  BY  242  WHICH  HOLDS  THE 

MEMBERSHIP  FUNCTION  ASSIGNMENTS! 0  OR  1) 

PASSED  IN  LABELLED  FORTRAN 
COMMON  BLOCK  "AREA3" 

ITERAT  -  INTEGER  WHICH  HOLDS  THE  NUMBER 
OF  ITERATIONS  REQUIRED 

CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC 

PSEUDO  -  CODE  SOLUTION 


ENTER  CRMEAN 

SET  MAXIMUM  NUMBER  OF  ITERATIONS  ALLOWED 
COMPUTE  INITIAL  CLUSTER  MEANS 
INITIALIZE  ITERATIONS  TO  0 

DO  UNTIL(CLUSTER  MEANS  STABIL IZE( UPDATE  ERROR 
<  EPSILON  OR  MAXIMUM  NUMBER 
OF  ITERATIONS  COMPLETE) 

SET  VECTOR  INDEX=1 

DO  UNTILCALL  VECTORS  IN  DATA  SET  ASSIGNED 
TO  CLUSTER  OF  CLOSEST  MEAN) 

SET  CLUSTER-MEAN  INDEX=1 
DO  UNTILCCLOSEST  CLUSTER  MEAN  TO 
CURRENT  VECTOR  FOUND) 

COMPUTE  DISTANCE  -  CURRENT  VECTOR 
TO  CURRENT  CLUSTER  MEAN 
IF  (FIRST  CLUSTER  MEAN  IN  LIST)  THEN 

SET  MINIMUM  DISTANCE  TO  DISTANCE  COMPUTED 
SET  CLOSEST  INDEX  TO  1 
ELSE  IF  (DISTANCE  LESS  THAN  PREVIOUS 

MINIMUM)  THEN 

SET  MINIMUM  DISTANCE  TO  NEW  MINIMUM 
SET  CLOSEST  INDEX  TO  THAT  OF  NEW  MINIMUM 
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END  IF 

INCREMENT  CLUSTER-MEAN  INDEX 
END  DO  UNTIL 

ASSIGN  CURRENT  VECTOR  TO  CLUSTER  OF  CLOSEST  MEAN 
INCREMENT  VECTOR  INDEX 
END  DO  UNTIL 

COMPUTE  NEW  CLUSTER  MEANS  BASED  ON  NEW 
ASSIGNMENT  AND  FIND  MAXIMUM  UPDATE 
ERROR  FOR  ALL  CLUSTER  MEANS 
INCREMENT  ITERATION  COUNT 
END  DO  UNTIL 
RETURN 

CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC 

SUBROUTINE  CRMEAN(KLASES> VCOUNT , FFEAT , LFEAT , 

1  EPSLON, ITERAT) 

REAL  MFUNCT (3»242)»X(4,242) »PR0T0(4  >  3) »UPDATE(4) 
INTEGER  CLOSER, VCOUNT, FFEAT, LFEAT, CLASS, VECTOR 
INTEGER  FETUR,CCOUNT 

COMMON  /AREA1/X  /AREA2/PR0T0  /AREA3/MFUNCT 
*xx  INITIALIZE  xxx 
MAXITR=50 

COMPUTE  CLUSTER  MEANS  BASES  ON  INITIAL 
MEMBERSHIP  ASSIGNMENTS 

DO  UNTIL (CLUSTER  MEAN  FOR  EACH  CLASS  COMPUTED) 

DO  5  CLASS=1 , KL ASES 

DO  UNTIL (MEAN  VECTOR  i  "CLASS"  ZEROED) 

DO  1  FETUR=FFEAT, LFEAT 
PROTO (FETUR, CL ASS )=0 .0 
CONTINUE 
END  DO  UNTIL 

CCOUNT=0 

DO  UNTIL (MEAN  FOR  CURRENT  CLASS  COMPUTED) 

DO  3  VECTOR=l, VCOUNT 

I F( MFUNCT ( CLASS, VECTOR) . EQ . 1 . 0 )  THEN 
CCOUNT=CCOUNT+l 

DO  UNTIL(VECTOR  NUMBER  "VECTOR"  INCLUDED) 

DO  2  FETUR=FFEAT. LFEAT 

PROTO( FETUR, CL ASS )=PR0T0( FETUR, CLASS)+ 

1  X(FETUR, VECTOR) 

CONTINUE 
END  DO  UNTIL 
END  IF 

CONTINUE 
END  DO  UNTIL 

DO  UNTIL(MEAN  VECTOR  DIVIDED  BY  "CCOUNT") 

DO  4  FETUR=FFEAT, LFEAT 

PROTO( FETUR , CLASS ) =PROTO( FETUR , CL ASS )/CCOUNT 
CONTINUE 
END  DO  UNTIL 

CONTINUE 
END  DO  UNTIL 

SET  ITERATIONS  TO  ZERO  INITIALLY 
ITERAT=0 

xxx  BEGIN  ITERATIONS  xxx 

DO  UNTIL (CLUSTER  MEANS  STABILIZE  OR  MAXIMUM 
ITERATIONS  COMPLETED) 
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6 

C 

C 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 


7 

c 

c 


c 

8 

C 

C 

C 

C 


9 
C 
C 
C 
C 

C 

10 
c 
c 
c 
c 

c 

c 


11 

c 

c 

c 

c 


c 


12 

c 

c 

13 

C 


CONTINUE 

DO  UNTIUALL  VECTORS  ASSIGNED  MEMBERSHIP 

VALUE=1 . 0  IN  CLOSEST  CLASSC  MINIMUM 
DISTANCE  TO  PROTO)  AND  VAIUE=0.0 
IN  OTHER  CLASSES) 

DO  10  VECTOR=l» VCOUNT 

DO  UNTILCMINIMUM  DISTANCE-  CURRENT  VECTOR 
TO  ALL  MEANS  POUND) 

DO  8  CLASS=1 , KLASES 

XDIST=0.0 

DO  UNTIL(DISTANCE  SQUARED  COMPUTED  FOR 

CURRENT  VECTOR  AND  CLASS  PROTO) 

DO  7  FETUR=FFEAT »LFEAT 

TEMP=X(FETUR, VECTOR )-PROTOCFETUR, CL ASS) 
XDIST=XDIST+TEMP*TEMP 
CONTINUE 
END  DO  UNTIL 

IF(CLASS.EQ.l)  THEN 
DISTMN=XDIST 
CL0SER=1 

ELSE  IFtXDIST . LT . DISTMN)  THEN 
DISTMN=XDIST 
CLOSER=CL ASS 
END  IF 

CONTINUE 
END  DO  UNTIL 

DO  UNTIL(ALL  MEMBERSHIP  FUNCTIONS  OF 
CURRENT  VECTOR  ZEROED) 

DO  9  CLASS=1, KLASES 

MFUNCT( CL ASS » VECTOR )=0 . 0 
CONTINUE 
END  DO  UNTIL 

ASSIGN  VECTOR  TO  CLASS  NUMBER  "CLOSER" 

MFUNCT (CLOSER# VECTOR )=1 . 0 

CONTINUE 
END  DO  UNTIL 

DO  UNTILtNEW  CLASS  MEANS  COMPUTED  AND 

MAXIMUM  UPDATE  DIFFERENCE  FOUND) 

DO  16  CLASS=1, KLASES 

CCOUNT=0 

DO  UNTIL (TEMPORARY  MEAN  VECTOR  ZEROED) 

DO  11  FETUR=FFEAT,LFEAT 
UPDATE(FETUR)=0 . 0 
CONTINUE 
END  DO  UNTIL 

DO  UNTIL (CLASS  MEAN  FOR  CURRENT  CLASS  SUMMED) 
DO  13  VECTOR=l, VCOUNT 

IF(MFUNCT(CLASS, VECTOR). EQ. 1.0)  THEN 
CC0UNT=CC0UNT+1 

DO  UNTIL ( VECTOR  "VECTOR"  INCLUDED) 

DO  12  FETUR=FFEAT,LFEAT 

UPDATE(FETUR)=UPDATE(FETUR)+ 

1  X(FETUR, VECTOR) 

CONTINUE 
END  DO  UNTIL 
END  IF 

CONTINUE 
END  DO  UNTIL 
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00  UNTILCMEAN  VECTOR  DIVIDED  BY  "CCOUNT") 

DO  14  FETUR=FFEAT , LFEAT 

UPDAT E( FETUR )=UPDATE( FETUR) /CCOUNT 

4  CONTINUE 
END  DO  UNTIL 

DIST=0 • 0 

DO  UNTIUDISTANCE  SQUARED  UPDATE  MEAN  TO 

PREVIOUS  MEAN  FOUND,  AND  NEW  MEAN 
ASSIGNED  AS  THE  UPDATE  VECTOR  FOUND) 
DO  15  FETUR=FFEAT, LFEAT 

TEMP=PRO TO (FETUR, CL ASS) -UPDATE (FETUR) 
DIST=DIST+TEMPXTEMP 
PROTO(FETUR,CLASS)=UPDATE( FETUR) 

5  CONTINUE 
END  DO  UNTIL 

IF(CLASS.EQ.l)  THEN 
DIFMAX=DIST 

ELSE  IF(DIST.GT.DIFMAX)  THEN 
DIFMAX=DIST 
END  IF 

6  CONTINUE 
END  DO  UNTIL 

INCREMENT  ITERATION  COUNT 
ITERAT=ITERAT+1 

IFC (DIFMAX.GT. EPSLON) . AND. ( ITERAT . LE. MAXITR ) ) 

1  GO  TO  6 

END  DO  UNTIL 

RETURN 
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^OPTIMIZE 

A  SUBROUTINE  WHICH  IMPLEMENTS  A  FUZZY  VERSION 
OF  THE  NEAREST  NEIGHBOR  ALGORITHM.  THE  RESULT 
IS  TO  ASSIGN  MEMBERSHIP  FUNCTION  VALUES  TO  THE 
VECTOR  TO  BE  CLASSIFIED  INSTEAD  OF  ASSIGNING 
THE  VECTOR  TO  ONE  OF  THE  CLASSES  REPRESENTED 
BY  THE  LABELLED  DATA  USED. 

WRITTEN  BY:  MICHAEL  R.  GRAY 

COMPLETED:  MAR  84 

FILENAME:  FUZNEARN.FTN 

CALLING  SEGUENCEtFROM  A  FORTRAN  ROUTINE):  CALL 
FUZNN( FZFIER, KLASES , FFEAT , LFEAT , VCOUNT , K, TEST1 ) 

INPUT  VARIABLESLNOT  CHANGED): 

FZFIER  -  REAL  VALUE  OF  THE  WEIGHTING  FACTOR 
USED  IN  THE  ALGORITHM 

KLASES  -  INTEGER  COUNT  OF  NUMBER  OF  CLASSES 

FFEAT  -  INTEGER  WHICH  SETS  FIRST  FEATURE  IN 
DATA  VECTORS  TO  CONSIDER 

LFEAT  -  INTEGER  WHICH  SETS  LAST  FEATURE  IN 
DATA  VECTORS  TO  CONSIDER 

VCOUNT  -  INTEGER  COUNT  OF  NUMBER  OF  DATA 
VECTOR  SAMPLES 

K  -  INTEGER  COUNT  OF  NUMBER  OF  NEAREST  NEIGHBORS 
TO  USE  FOR  MEMBERSHIP  FUNCTION  ASSIGNMENTS 

TEST1  -  INTEGER  INDEX  WHICH  INDICATES  WHICH  OF 
THE  DATA  VECTORS  IS  THE  CURRENT  TEST 
SAMPLE  TO  BE  ASSIGNED  MEMBERSHIPS 

X  -  REAL  ARRAY  4  BY  242  WHICH  HOLDS  ALL  DATA  VECTORS, 
PASSED  IN  LABELLED  FORTRAN  COMMON  "AREA1" 

MFUNCT  -  REAL  ARRAY  3  BY  242  WHICH  HOLDS 

MEMBERSHIPS  OF  LABELLED  SAMPLES  USED 
IN  ASSIGNMENT  OF  "TEST1"’S  MEMBERSHIPS, 

PASSED  IN  LABELLED  FORTRAN  COMMON  "AREA3" 

OUTPUT  VARIABLES: 

NMFUNC  -  REAL  ARRAY  3  BY  242  WHICH  HOLDS  THE 
MEMBERSHIP  ASSIGNMENTS  PRODUCED, 

PASSED  IN  LABELLED  FORTRAN  COMMON  "AREA5" 

CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC 

PSEUDO  -  CODE  SOLUTION 


ENTER  FUZNEARN 

INITIALIZE  LABELLED  SAMPLE  INDEX 
INITIALIZE  MATCH  =  NO 
INITIALIZE  NEAREST  NEIGHBOR  COUNTER 
DO  UNTIL (  "K"  NEAREST  NEIGHBORS  FOUND 
OR  A  MATCH  FOUND) 

COMPUTE  DISTANCE  FROM  TEST  VECTOR 
TO  CURRENT  LABELLED  SAMPLE 
IF  (  DISTANCE  =  0  )  THEN 
MATCH  =  YES 

ELSE  IF  (  NEAREST  NEIGHBOR  COUNTER  <= 

"K"  )  THEN 

INCLUDE  CURRENT  LABELLED  SAMPLE 
IN  LIST  OF  NEAREST  NEIGHBORS 
INCREMENT  NEAREST  NEIGHBOR  COUNTER 
ELSE  IF  (DISTANCE  LESS  THAN  THAT 
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OF  ANY  PREVIOUS  NEAREST  NEIGHBOR  )  THEN 
DELETE  FARTHEST  OF  PREVIOUS  NEAREST  NEIGHBORS 
INCLUDE  NEW  NEAREST  NEIGHBOR  IN  LIST 
END  IF 

INCREMENT  LABELLED  SAMPLE  INDEX 
END  DO  UNTIL 
IF  <  MATCH  =  NO  )  THEN 

ASSIGN  MEMBERSHIPS  OF  MATCH  LABELLED 
SAMPLE  TO  TEST  VECTOR 
ELSE 

ASSIGN  MEMBERSHIPS  AS  A  FUNCTION  OF  INVERSE 
DISTANCES  FROM  NEAREST  NEIGHBORS 
END  IF 
RETURN 


SUBROUTINE  FUZNNC FZFIER, KLASES , FFEAT, LFEAT, 
1  VCOUNT  *K,TEST1) 


LOGICAL  MATCH 

REAL  MFUNCTC3,242),XC4,242),NMFUNCC3,242),DNEAR(10) 
INTEGER  FFEAT , CLASS , NEARC 10 ) , FETUR 
INTEGER  TEST 1 ,COUNTR,VTR AIN, VCOUNT 
COMMON  /AREA3/MFUNCT  /AREA1/X  /AREA5/NMFUNC 

COMPUTE  POWER  DEPENDENT  ON  "FUZZIFIER" 

POUER=l . 0/<  FZFIER- 1 .0) 

MATCH=. FALSE. 

DSTMAX=0.0 
VTRAIN=1 
C0UNTR=1 

DO  UNTIL("K"  NEAREST  NEIGHBORS  FOUND 
OR  A  MATCH  OCCURS) 

CONTINUE 
DIST=0 . 0 

IFCVTRAIN.NE.TEST1)  THEN 

DO  UNTILCDISTANCE  COMPUTED) 

DO  3  FETUR=FFEAT, LFEAT 

TEMP=X( FETUR, VTRAIN)-X( FETUR, TEST1) 
DIST=DIST+TEMP«TEMP 
CONTINUE 
END  DO  UNTIL 

IFCDIST . EQ . 0 . 0)  THEN 
MATCH=- TRUE. 

MATNUM=VTRAIN 
ELSE  IF(COUNTR.LE.K)  THEN 
DNEARC COUNT R)=DI ST 
NEAR(COUNTR)=VTRAIN 
IF(DNEAR(COUNTR) . GT . DSTMAX)  THEN 
DSTMAX=DNEAR(COUNTR) 

MAXNER=COUHTR 
END  IF 

COUNTR=COUNTR+l 
ELSE  IFCDIST.LT. DSTMAX)  THEN 
DSTMAX=DIST 
DNEAR(MAXNER)=DIST 
NEARCMAXNER)=VTRAIN 
DO  UNTILCNEW  MAXIMUM  DISTANCE  OF 

K-NEAREST  NEIGHBORS  FOUND) 

DO  4  INDEX=1,K 

IFCDNEARC INDEX). GT. DSTMAX)  THEN 
DSTMAX=DNEAR( INDEX) 

MAXNER=INDEX 
END  IF 

4  CONTINUE 

C  END  DO  UNTIL 


I 
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END  IF 
C 

VTRAIN=VTRAIN+1 

C 

IF(C. NOT. MATCH). AND. CVTRAIN.LE.VCOUNT)) 

1G0  TO  1 

END  DO  UNTIL 

IF(. NOT. MATCH)  THEN 

DO  UNTIUMEMBERSHIP  FUNCTIONS  ASSIGNED 
TO  VECTOR  NUMBER  "VTEST") 

DO  6  CLASS=1 > KLASES 

NMFUNCICLASS , TEST1 )=0 . 0 
SUM=0.0 

DO  UNTIUMEMBERSHIP  FUNCTION  COMPUTED 
FOR  CLASS  NUMBER  "CLASS") 

DO  5  INDEX=1,K 

WDIST=  1 . O/DNEARC INDEX)*#POWER 
SUM=SUM+WDIST 

NMFUNCC CLASS  *  TEST 1 )=NMFUNC( CLASS » TEST1 )  + 
1  MFUNCT (CLASS, NEAR( INDEX) )KMDIST 

CONTINUE 
END  DO  UNTIL 


NMFUNCCCLASS, TEST1 )=NMFUNC(CLASS, TEST1 )/SUM 

CONTINUE 
END  DO  UNTIL 

ELSE 

DO  UNTIUMEMBERSHIP  FUNCTIONS  ASSIGNED) 

DO  7  CLASS=1, KLASES 

NMFUNC(CLASS, TEST! )=MFUNCT(CLASS,MATNUM) 
CONTINUE 
END  DO  UNTIL 

END  IF 
C 

RETURN 

END 
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♦OPTIMIZE 

A  SUBROUTINE  WHICH  IMPLEMENTS  A  K-NEAREST 
NEIGHBOR  ALGORITHM.  VECTORS  ARE  ASSIGNED 
TO  A  CLASS  VIA  MEMBERSHIP  FUNCTION  ASSIGNMENT 
WITH  POSSIBLE  VALUES=*0,1* 

WRITTEN  BY:  MICHAEL  R.  GRAY 

COMPLETED:  MAR  84 

FILENAME:  CRISPNN.FTN 

CALLING  SEQUENCECFROM  A  FORTRAN  ROUTINE): 

CAL L  CRSPNNCKLASES , FFEAT , L FEAT , K , VCOUNT , TEST1 ) 

INPUT  VARIABLEStNOT  CHANGED): 

KLASES  -  INTEGER  COUNT  OF  NUMBER  OF  CLASSES 

FFEAT  -  INTEGER  WHICH  SETS  FIRST  FEATURE  IN 
DATA  VECTORS  TO  CONSIDER 

LFEAT  -  INTEGER  WHICH  SETS  LAST  FEATURE  IN 
DATA  VECTORS  TO  CONSIDER 

K  -  INTEGER  COUNT  OF  NUMBER  OF  NEAREST 
NEIGHBORS  TO  USE  FOR  CLASSIFICATION 

VCOUNT  -  INTEGER  COUNT  OF  NUMBER  OF 
DATA  VECTORS  IN  DATA  SET 

TEST1  -  INTEGER  INDEX  OF  SAMPLE  WHICH 
IS  BEING  CLASSIFIED 

X  -  REAL  ARRAY  4  BY  242  WHICH  HOLDS  ALL  DATA  VECTORS, 
PASSED  IN  LABELLED  FORTRAN  COMMON  "AREA1” 

MFUNCT  -  REAL  ARRAY  3  BY  242  WHICH  HOLDS  MEMBERSHIPS 
LABELLED  SAMPLES  USED  IN  CLASSIFICATION  OF 
"TEST1",  PASSED  IN  LABELLED  FORTRAN 
COMMON  "AREA3" 

OUTPUT  VARIABLES: 

NMFUNC  -  REAL  ARRAY  3  BY  242  WHICH  HOLDS  THE 
MEMBERSHIP  ASSIGNMENTS  OF  SAMPLE 
"TEST1",  PASSED  IN  LABELLED 
FORTRAN  COMMON  "AREA3" 

CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC 

PSEUDO  -  CODE  SOLUTION 

ENTER  CRSPNN 

INITIALIZE  LABELLED  SAMPLE  INDEX 
INITIALIZE  MATCH  =  NO 

INITIALIZE  NEAREST  NEIGHBOR  COUNTER  TO  0 
DO  UNTIL(”K"  NEAREST  NEIGHBORS  FOUND 
OR  A  MATCH  FOUND) 

COMPUTE  DISTANCE  FROM  TEST  VECTOR 
TO  CURRENT  LABELLED  SAMPLE 
IFIDISTANCE  =  0  )  THEN 
MATCH  =  YES 

ELSE  IF  <  NEAREST  NEIGHBOR  COUNTER  <=  "K")  THEN 
INCLUDE  CURRENT  LABELLED  SAMPLE  IN 
LIST  OF  K  NEAREST  NEIGHBORS 
INCREMENT  NEAREST  NEIGHBOR  COUNTER 
ELSE  IFIDISTANCE  LESS  THAN  THAT  OF  ANY 

PREVIOUS  NEAREST  NEIGHBOR)  THEN 
DELETE  FARTHEST  OF  PREVIOUS  NEAREST  NEIGHBORS 
INCLUDE  NEW  NEAREST  NEIGHBOR  IN  LIST 
INCREMENT  LABELLED  SAMPLE  INDEX 
END  DO  UNTIL 
IF  (MATCH  =  NO  )  THEN 

COUNT  NUMBER  OF  NEAREST  NEIGHBORS 


OO  OOW  OO  O  OHOOO  o  o  oooooooooooooooooooo 


FROM  EACH  CLASS 

IF( A  TIE  FOR  MAXIMUM  NUMBER  BETWEEN  CLASSES)  THEN 
COMPUTE  SUM  OF  DISTANCES  FROM  TEST  VECTOR 
TO  ALL  NEIGHBORS  IN  EACH  TYING  CLASS 
IF(  TIE  IN  SUMS  COMPUTED)  THEN 

ASSIGN  TEST  VECTOR  TO  LAST  CLASS  WHICH  TIED 
ELSE 

ASSIGN  TEST  VECTOR  TO  CLASS  OF  MINIMUM  SUM 
END  IF 
ELSE 

ASSIGN  TEST  VECTOR  TO  CLASS  OF 
MAXIMUM  NUMBER  QF  NEIGHBORS 
END  IF 
ELSE 

ASSIGN  TEST  VECTOR  TO  CLASS  OF  MATCH 
END  IF 
RETURN 

CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC 
SUBROUTINE  CRSPNN(KLASES, FFEAT,LFEAT,K,VCOUNT,TESTl) 
LOGICAL  MATCH 

REAL  MFUNCT(3,242),X(4,242),NMFUNC(3,242) 

REAL  DNEARC 10  )  , DSUM( 10 ) 

INTEGER  FFEAT , CLASS , VECTOR, NEAR< 10 ) . FEATUR 
INTEGER  VCOUNT , NEARCL (10),END(3),VTRAIN 
INTEGER  COUNT RECOUNT ( 3 ) , MAXNUMC 10 ) >  TEST1 > START (3) 
COMMON  /AREA3/MFUNCT  /AREA1/X  /AREA5/NMFUNC 
COMMON  /AREA6/  START, END 

MATCH=. FALSE. 

DSTMAX=0 . 0 

VTRAIN=1 

COUNTR=l 

DO  UNTUCK"  NEAREST  NEIGHBORS  FOUND 
OR  A  MATCH  OCCURS) 

CONTINUE 

IF(VTRAIN.NE.TESTl)  THEN 
DIST=0.0 

DO  UNTIUDISTANCE  COMPUTED) 

DO  3  FEATUR=FFEAT , LFEAT 

TEMP=X( FEATUR. VTRAIN)-XC FEATUR, TEST1) 
DIST=DIST+ TEMP* TEMP 
CONTINUE 
END  DO  UNTIL 

IFCDIST . EQ . 0 . 0 )  THEN 
MATCH=.TRUE. 

MATNUM=VTRAIN 
ELSE  IF(COUNTR.LE.K)  THEN 
DNEARCCOUNTR )=DIST 
NEARCCOUNTR)=VTRAIN 
IF(DNEARCCOUNTR) .GT.DSTMAX)  THEN 
DSTMAX=DNEARC COUNT R) 

MAXNER=C0UNTR 
END  IF 

COUNTR=COUNTR+l 
ELSE  IF(DIST.LT.DSTMAX)  THEN 
DSTMAX=DIST 
DNEAR(MAXNER)=DIST 
NEAR(MAXNER)=VTRAIN 
DO  UNTIL (NEW  MAXIMUM  DISTANCE  OF  K 
NEAREST  NEIGHBORS  FOUND) 

DO  4  INDEX=1,K 

IF(DNEARUNDEX). GT.DSTMAX)  THEN 
DSTMAX=DNEAR( INDEX) 

MAXNER=INDEX 
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END  IF 
CONTINUE 
END  DO  UNTIL 
END  IF 

END  IF 

VTRAIN=VTRAIN*1 

IFC( .NOT. MATCH). AND. (VTRAIN. LE.VCOUNT) ) 

GO  TO  1 

END  DO  UNTIL 

IF( .NOT. MATCH)  THEN 


DO  UNTILCCOUNTS  OF  NEAREST  NEIGHBORS 
CLASS’S  AND  CURRENT  VECTOR’S 
MEMBERSHIP’S  ZEROED) 

DO  5  CLASS=1,KLASES 
COUNT (CLASS)=0 
NMFUNCC  CLASS , TEST1 )=0 . 0 
CONTINUE 
END  DO  UNTIL 

DO  UNTILtCLASS  NUMBER  OF  K-NEAREST 
NEIGHBORS,  AND  COUNT  QF 
NEAREST  NEIGHBORS  IN  A  CLASS  FOUND) 

DO  7  INDEX=1 , K 

DO  UNTIL("INDEX"  NEIGHBOR’S 
CLASS  FOUND) 

DO  6  CLASS=1,KLASES 

IF( (NEAR (INDEX) .GE. START (CLASS)). AND. 

(NEAR( INDEX). LE.END( CLASS)))  THEN 
C0UNT(CLASS)=C0UNT(CLAS5)+1 
NEARCL ( INDEX)=CLASS 
END  IF 
CONTINUE 
END  DO  UNTIL 

CONTINUE 
END  DO  UNTIL 

DO  UNTIL(COUNTS  SEARCHED  FOR  MAXIMUM(S)  ) 

DO  8  CLASS=1 , KLASES 
IF(CLASS.EQ.l)  THEN 
MAX=COUNT ( CLASS ) 

MAXCNT=1 

MAXNUM(MAXCNT)=CLASS 
ELSE  IF(COUNT(CLASS).GT .MAX)  THEN 
MAX=COUNT (CLASS ) 

MAXNUM(MAXCNT)=CLASS 
ELSE  I F( COUNT (CL ASS) . EQ.MAX)  THEN 
MAXCNT=MAXCNT+1 
MAXNUM(MAXCNT)=CLASS 
END  IF 
CONTINUE 
END  DO  UNTIL 

IF(MAXCNT.EO.l)  THEN 

NMFUNC(MAXNUM(MAXCNT) ,TEST1 )=1 . 0 
ELSE 

DO  UNTIL(SUM  OF  DISTANCES  OF  NEIGHBORS 
IN  EACH  CLASS  WHICH  TIED 
FOR  A  MAJORITY  COMPUTED) 

DO  10  INDEX=1,MAXCNT 
DSUM( INDEX)=0 . 0 
DO  9  NEIBOR=l , K 

I F( NEARCL (NEIBOR) . EQ .MAXNUM( INDEX) )  THEN 
DSUM( INDEX)=DSUM( INDEX )+DN EAR (N El BOR) 

END  IF  a 

CONTINUE  if 

CONTINUE 
END  DO  UNTIL 
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DO  UNTILCMAX  OF  SUMS  OF  DISTANCES 
COMPUTED  ABOVE  FOUND) 

DO  11  INDEX=1,MAXCNT 
IFC INDEX . EQ . 1 )  THEN 
DMIN=DSUM( INDEX) 

MIN=INDEX 

ELSE  IFLDSUM(INDEX) . LE. DMIN)  THEN 
DMIN=DSUM( INDEX) 

MIN=INDEX 
END  IF 
CONTINUE 
END  DO  UNTIL 

ASSIGN  VECTOR  TO  THE  CLASS  WITH  MINIMUM 
TOTAL  DISTANCESCTIE  GOES  TO  LAST  MINIMUM  FOUND) 

NMFUNCLNEARCL(MIN) ,TEST1 )=1 . 0 
END  IF 

ELSE 

DO  UNTIL (MEMBERSHIP  FUNCTIONS  ASSIGNED) 

DO  12  CL ASS=1 >  KL ASES 

NMFUNC(CLASS, TEST1 )=MFUNCT(CLASS,MATNUM) 
CONTINUE 
END  DO  UNTIL 

END  IF 
C 

RETURN 

END 


oooooooooooooooooooooooooooooooonooonoooooooooonooooooooooooooooooooooo 
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^OPTIMIZE 

A  ROUTINE  WHICH  IMPLEMENTS  A  FUZZY  VERSION 
OF  THE  NEAREST  PROTOTYPE  ALGORITHM.  THE 
RESULT  IS  TO  ASSIGN  MEMBERSHIP  FUNCTION 
VALUES  TO  THE  VECTOR  TO  BE  CLASSIFIED 
INSTEAD  OF  ASSIGNING  THE  VECTOR  TO  ONE 
OF  THE  CLASSES  REPRESENTED  BY  THE  PROTOTYPE 
DATA  USED. 

WRITTEN  BY:  MICHAEL  R.  GRAY 
COMPLETED:  MAR  84 
FILENAME:  FUZPROTO.FTN 

CALLING  SEQUENCE! FROM  A  FORTRAN  ROUTINE): 

CALL  FPROTO(FZFIER,KLASES, FFEAT , L FEAT, TEST 1) 

INPUT  VARIABLES(NOT  CHANGED): 

FZFIER  -  REAL  WEIGHTING  FACTOR  USED  IN 
THE  MEMBERSHIP  ASSIGNMENTS 

KLASES  -  INTEGER  COUNT  OF  NUMBER  OF  CLASSES 

FFEAT  -  INTEGER  WHICH  SETS  THE  FIRST 

FEATURE  IN  THE  DATA  SET  TO  CONSIDER 

LFEAT  -  INTEGER  WHICH  SETS  THE  LAST  FEATURE 
IN  THE  DATA  SET  TO  CONSIDER 

TEST1  -  INTEGER  INDEX  WHICH  POINTS  TO  VECTOR 
IN  DATA  TO  ASSIGN  MEMBERSHIPS 

X  -  REAL  ARRAY  4  BY  242  WHICH  HOLDS  SAMPLES, 

PASSED  IN  LABELLED  FORTRAN  COMMON  "AREA1" 

PROTO-REAL  ARRAY  4  BY  3  WHICH  HOLDS  CLASS 
PROTOTYPES,  PASSED  IN  LABELLED 
FORTRAN  COMMON  "AREA2" 

OUTPUT  VARIABLES: 

NMFUNC  -  REAL  ARRAY  3  BY  242  WHICH  HOLDS  THE 

MEMBERSHIP  FUNCTIONS  ASSIGNED,  PASSED 
IN  LABELLED  FORTRAN  COMMON  "AREA5" 

CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC 

PSEUDO  -  CODE  SOLUTION 


ENTER  FPROTO 

SET  CLASS  INDEX=i 
INITIALIZE  MATCH  =  NO 

DO  UNTILCDISTANCE  FROM  TEST  SAMPLE  TO  ALL 

PROTOTYPES  COMPUTED  OR  A  MATCH  FOUND) 

COMPUTE  DISTANCE  FROM  CURRENT  CLASS 
PROTOTYPE  TO  TEST  SAMPLE 
IF  (DISTANCE  =  0  )  THEN 
MATCH  =  YES 
END  IF 

INCREMENT  CLASS  INDEX 
END  DO  UNTIL 
IF  (  MATCH  )  THEN 

ASSIGN  TEST  SAMPLE  MEMBERSHIP  OF 
PROTOTYPE  WHICH  MATCHED 
ELSE 

ASSIGN  MEMBERSHIPS  AS  A  FUNCTION 
OF  INVERSE  DISTANCES  COMPUTED 
END  IF 
RETURN 

CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC 

c 


I AD-A145  571  AN  INVESTIGATION  INTO  FU22V  CLUSTERING  AND  2/2 

CLASSIFICATIONCU)  AIR  FORCE  INST  OF  TECH 
WRIGHT-PATTERSON  AFB  OH  M  R  GRAV  JUL  84 
UNCLASSIFIED  AFIT/CI/NR-84-47T  F/G  12/i  NL 
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I 


SUBROUTINE  FPROTO(FZFIER,KLASES»FFEAT , LFEAT, TESTl) 
LOGICAL  MATCH 

REAL  PROTO ( 4 >  3) »NMFUNC( 3,242), PDIST(3)»X(4, 242) 
INTEGER  FFEAT, CLASS, VECTOR, FETUR, TESTl 
COMMON  /AREA2/PR0T0  / AREA1/X  /AREA5/NMFUNC 

COMPUTE  POWER  WHICH  IS  A  FUNCTION  OF  THE  "FUZZIFIER" 

P0WER=1 . 0/1 FZFIER-1 . 0) 

DSUM-0 . 0 
CLASS=1 

DO  UNTIL (DISTANCE  CURRENT  VECTOR  TO  EACH 
PROTOTYPE,  OR  A  MATCH  FOUND) 

CONTINUE 

MATCH=. FALSE. 

PDI ST (CLASS) =0.0 

DO  UNTIL(DISTANCE  COMPUTED) 

DO  2  FETUR=FFEAT,LFEAT 

TEMP=PROTO(FETUR,  CLASS  )-)((  FETUR,  TEST1) 
PDIST(CLASS)=PDIST(CLASS)+TEMP*TEMP 
CONTINUE 
END  DO  UNTIL 

IF(PDIST(CLASS) . EQ.O.O)  THEN 
MATCH= . TRUE . 

MNUM=CLASS 

ELSE 

PDIST(CLASS)=1 . O/PDIST (CLASS)XXPOWER 
DSUM=DSUM+PDIST (CLASS) 

END  IF 

CLASS=CLASS+1 

IF((. NOT. MATCH). AND. (CL ASS. LE.KLASES)) 

1  GO  TO  1 

END  DO  UNTIL 

IF(MATCH)  THEN 

DO  UNTIL (MEMBERSHIP  FUNCTIONS  OF 
CURRENT  VECTOR  ZEROED) 

DO  3  CLASS=1 , KLASES 

NMFUNC(CL ASS , TESTl )=0 . 0 
CONTINUE 
END  DO  UNTIL 

ASSIGN  THE  VECTOR  MEMBERSHIP  IN  THE 
CLASS  FOR  WHICH  A  MATCH  OCCURRED 

NMFUNC(MNUM, TESTl )=1 . 0 

ELSE 

DO  UNTIL (MEMBERSHIP  FUNCTIONS 

ASSIGNED  FOR  TEST  VECTOR) 

DO  4  CLASS=1, KLASES 

NMFUNC( CL ASS , TESTl )=PDIST(CLAS5)/DSUM 
CONTINUE 
END  DO  UNTIL 
END  IF 


RETURN 

END 
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tOPTIMIZE 

A  SUBROUTINE  UHICH  IMPLEMENTS  A  CRISP  VERSION 
OF  THE  NEAREST  PROTOTYPE  ALGORITHM.  THE 
RESULT  IS  TO  ASSIGN  MEMBERSHIP  FUNCTION 
VALUES  TO  THE  VECTOR  TO  BE  CLASSIFIED 
INSTEAD  OF  ASSIGNING  THE  VECTOR  TO  ONE  OF 
THE  CLASSES  REPRESENTED  BY  THE  PROTOTYPES 

WRITTEN  BY*  MICHAEL  R.  GRAY 

COMPLETED:  APR  84 

FILENAME:  CRISPNP.FTN 

CALLING  SEQUENCECFROM  A  FORTRAN  ROUTINE): 

CALL  CRSNPCKLASES, FFEAT , LFEAT,TEST1) 

INPUT  VARIABLES(NOT  CHANGED): 

KLASES  -  INTEGER  COUNT  OF  NUMBER  OF  CLASSES 

FFEAT  -  INTEGER  WHICH  SETS  THE  FIRST  FEATURE 
IN  DATA  SET  TO  CONSIDER 

LFEAT  -  INTEGER  WHICH  SETS  THE  LAST  FEATURE 
IN  DATA  SET  TO  CONSIDER 

TEST1  -  INTEGER  WHICH  HOLDS  THE  INDEX 
OF  THE  CURRENT  TEST  SAMPLE 

X  -  REAL  ARRAY  4  BY  242  WHICH  HOLDS  ALL  DATA 

VECTORS,  PASSED  IN  LABELLED  FORTRAN  COMMON  "AREA1" 

PROTO  -  REAL  ARRAY  4  BY  3  WHICH  HOLDS  THE  CLASS 
PROTOTYPES  USED  IN  MEMBERSHIP  FUNCTION 
ASSIGNMENT,  PASSED  IN  LABELLLED  FORTRAN 
COMMON  "AREA2" 

OUTPUT  VARIABLES: 

NMFUNC  -  REAL  ARRAY  3  BY  242  WHICH  HOLDS 

THE  RESULTING  MEMBERSHIP  FUNCTION 
ASSIGNMENTS  FOR  THE  CURRENT  TEST  VECTOR 

CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC 

PSEUDO  -  CODE  SOLUTION 


ENTER  CRSPNP 

INITIALIZE  MATCH  =  NO 
INITIALIZE  CLASS  INDEX=1 

DO  UNTIL (DISTANCES  FROM  TEST  SAMPLE  TO  EACH 

PROTOTYPE  COMPUTED  OR  A  MATCH  FOUND) 
COMPUTE  DISTANCE  FROM  TEST  SAMPLE 
TO  CURRENT  CLASS  PROTOTYPE 
IFCDISTANCE  =  0  )  THEN 
MATCH  =  YES 
END  IF 

INCREMENT  CLASS  INDEX 
END  DO  UNTIL 
IFIMATCH  =  YES)  THEN 

ASSIGN  TEST  VECTOR  MEMBERSHIPS 
OF  MATCHING  PROTOTYPE 
ELSE 

DETERMINE  CLASS  OF  CLOSEST  PROTOTYPE 
IF  (  A  TIE  EXISTS)  THEN 

ASSIGN  TEST  SAMPLE  TO  CLASS  OF  LAST 
CLOSE  PROTOTYPE  DETECTED 
ELSE 

ASSIGN  TEST  SAMPLE  TO  CLASS  OF 
CLOSEST  PROTOTYPE 
END  IF 
END  IF 
RETURN 
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C 

cccccccccccccccccccccccccccccccccccccccccccccccccccccccccccc 

c 

SUBROUTINE  CRSPNP(KLASES» FFEAT, LFEAT , TEST1 ) 

C 

LOGICAL  MATCH 

REAL  PR0T0(4  >3) , NMFUNC( 3 , 242) ,PDIST (3),X(4,242) 
INTEGER  FFEAT, CLASS, VECTOR, FETUR.TEST1 
COMMON  /AREA2/PR0T0  /AREAX/X  /AREA5/NMFUNC 
C 

DSUM=0 . 0 
CLASS=1 
MATCH=. FALSE. 

DO  UNTIL (DISTANCE  CURRENT  VECTOR  TO  EACH 

PROTOTYPE  COMPUTED,  OR  A  MATCH  FOUND) 

CONTINUE 

PDIST(CLASS)=0.0 

DO  UNTIL(DISTANCE  COMPUTED) 

DO  2  FETUR=FFEAT, LFEAT 

TEMP=PR0T0( FETUR , CL ASS )-X( FETUR, TEST 1 ) 
PDIST(CLASS)=PDIST(CLASS)+TEMP*TEMP 
CONTINUE 
END  DO  UNTIL 


IF(PDISTtCLASS) .EG. 0.0)  THEN 
MATCH=.TRUE. 

MNUM-CLASS 
END  IF 

CLASS=CLASS+1 

IFCC. NOT. MATCH). AND. (CLASS. LE. KLASES)) 

1  GO  TO  1 

END  DO  UNTIL 

DO  UNTIL (MEMBERSHIP  FUNCTIONS  OF 
CURRENT  VECTOR  ZEROED) 

DO  3  CL ASS=1 , KLASES 

NMFUNC( CLASS, TEST1 )=0. 0 
CONTINUE 
END  DO  UNTIL 

IF(MATCH)  THEN 

ASSIGN  THE  VECTOR  MEMBERSHIP  IN  THE  CLASS 
FOR  WHICH  A  MATCH  OCCURRED 

NMFUNC(MNUM, TEST1 )=1 . 0 

ELSE 

DO  UNTIL(CLOSEST  OF  THE  PROTOTYPES  FOUND) 
DO  4  CLASS=1, KLASES 
IF( CLASS . EQ . 1 )  THEN 
CLOSE=PDIST (CLASS ) 

NUMBER=CL ASS 

ELSE  IF(PDISTCCLASS) .LE. CLOSE)  THEN 
CLOSE=PDIST( CLASS) 

NUMBER=CLASS 
END  IF 
CONTINUE 
END  DO  UNTIL 

ASSIGN  VECTOR  TO  CLASS  OF  CLOSEST  PROTOTYPE 
(IF  A  TIE  CLOSEST=LAST  CLOSEST  FOUND) 


C 

C 


NMFUNC ( NUMBER , T  EST 1 ) =1 . 0 
END  IF 
RETURN 
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^OPTIMIZE 

A  SUBROUTINE  WHICH  ASSIGNS  MEMBERSHIP 
FUNCTION  VALUES  BASED  ON  THE  NEAREST 
NEIGHBOR  "FUZZIFYING"  RULE, 

WRITTEN  BY:  MICHAEL  R.  GRAY 

COMPLETED:  JUNE  84 

FILENAME:  MGFZFYNN . FTN 

CALLING  SEQUENCE! FROM  A  FORTRAN  ROUTINE):  CALL 
FZFYNN( FZFIER , KL ASES, VCOUNT , FFEAT , LFEAT  »K, TEST1 ) 

INPUT  VARIABLES(NOT  CHANGED): 

FZFIER  -  REAL  VALUE  OF  THE  WEIGHTING  FACTOR 

KLASES  -  INTEGER  COUNT  OF  NUMBER  OF  CLASSES 

VCOUNT  -  INTEGER  COUNT  OF  NUMBER  OF  SAMPLE  VECTORS 

FFEAT  -  INTEGER  WHICH  SET  FIRST  FEATURE 
IN  SAMPLE  VECTORS  TO  CONSIDER 

LFEAT  -  INTEGER  WHICH  SET  LAST  FEATURE  IN 
IN  SAMPLE  VECTOR  TO  CONSIDER 

K  -  INTEGER  COUNT  OF  NUMBER  OF  NEIGHBORS  TO 
USE  FOR  MEMBERSHIP  FUNCTION  INITIALIZATION 

TEST1  -  INTEGER  INDEX  OF  TEST  SAMPLE 

X  -  REAL  ARRAY  4  BY  242  WHICH  HOLDS  ALL  SAMPLE  VECTORS, 
PASSED  IN  LABELLED  FORTRAN  COMMON  "AREA1" 

START  -  INTEGER  ARRAY  OF  DIMENSION  3  WHICH  HOLDS  THE 
STARTING  INDICES  OF  EACH  CLASS  OF  SAMPLES, 
PASSED  IN  LABELLED  FORTRAN  COMMON  "AREA6" 

END  -  INTEGER  ARRAY  OF  DIMENSION  3  WHICH  HOLDS 

THE  ENDING  INDICES  OF  EACH  CLASS  OF  SAMPLES, 
PASSED  IN  LABELLED  FORTRAN  COMMON  "AREA6" 

OUTPUT  VARIABLES: 

MFUNCT  -  REAL  ARRAY  3  BY  242  WHICH  HOLDS 

THE  MEMBERSHIP  FUNCTION  ASSIGNMENTS, 

PASSED  IN  LABELLED  FORTRAN  COMMON  "AREAS" 


PSEUDO  -  CODE  SOLUTION 


ENTER  FZFYNN 

INITIALIZE  VECTOR  INDEX  =  1 

DO  UNTIL  (  EACH  LABELLED  SAMPLE  USED  TO  CLASSIFY 
"TE5T1"  ASSIGNED  MEMBERSHIPS) 
IF(VECTOR  NOT  EQUAL  TO  "TEST1")  THEN 
FIND  K  NEAREST  NEIGHBORS  IN  LABELLED 
SAMPLE  SET  TO  CURRENT  VECTOR 
DETERMINE  COUNT  OF  NEAREST  NEIGHBORS 
TO  USE  FOR  INITIALIZATION 
INITIALIZE  CLASS  INDEX  =  1 
DO  UNTIL  (CURRENT  VECTOR  ASSIGNED 

MEMBERSHIP  IN  ALL  CLASSES) 

IFCCLASS  INDEX  =  KNOWN  CLASS 

OF  CURRENT  VECTOR)  THEN 
ASSIGN  MEMBERSHIP  =  0.51  ♦  (COUNT  OF 
"K"  NEAREST  NEIGHBORS  FOUND/"K")  *  0.49 
ELSE 

ASSIGN  MEMBERSHIP  =  (COUNT  OF  "K" 

NEAREST  NEIGHBORS  FOUND/"K")*0 .49 
END  IF 
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INCREMENT  CLASS  INDEX 
END  DO  UNTIL 
END  IF 

INCREMENT  VECTOR  INDEX 
END  DO  UNTIL 
RETURN 

CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC 

SUBROUTINE  FZFYNN( F2FI ER , XL ASES  * VCOUNT , FFEAT , 

1  LFEAT ,K,TEST1 ) 

REAL  MFUNCT ( 3 , 242 ) , X( 4 , 242 ) , DNEAR ( 1 0 ) 

INTEGER  FFEAT , COUNTR , VECTOR, NEARC 10 ) 

INTEGER  START( 3 ) , END( 3 ) , VCOUNT , T EST 1 , VTRAIN 

INTEGER  FETUR • CLASS, COUNT ( 3 ) , VCLASS 

COMMON  /AREA3/MFUNCT  /AREA1/X  /AREA6/START , END 


DO  UNTIL(ALL  DATA  ASSIGNED  MEMBERSHIPS) 

DO  40  VTRAIN=1, VCOUNT 

IF( VTRAIN. NE.TEST1)  THEN 

COUNTR=l 
DSTMAX-0 . 0 
MAXNER=1 

DO  UNTIL (K  NEAREST  NEIGHBORS  FOUND) 

DO  5  VECTOR=l, VCOUNT 

IF< (VECTOR .NE. VTRAIN) .AND. 

1( VECTOR. NE.TEST1))  THEN 

DIST=0 . 0 

DO  UNTIL (DISTANCE  COMPUTED) 

DO  3  FETUR=FFEAT, LFEAT 

T EMP=X( FETUR, VTRAIN )-X( FETUR, VECTOR) 
DIST=DIST*TEMP*TEMP 
CONTINUE 
END  DO  UNTIL 

I F( COUNTR . LE.K)  THEN 
DNEAR( COUNTR )=DIST 
NEAR ( COUNTR )=VECTOR 
IF(DNEAR( COUNTR). GT.DSTMAX)  THEN 
DSTMAX=DNEAR ( COUNTR ) 

MAXNER-COUNTR 
END  IF 

COUNTR=COUNTR+l 

ELSE  IFCDIST.LT.DNEAR(MAXNER))  THEN 
DNEAR(MAXNER)=DIST 
NEAR (MAXNER)=V ECTOR 
DO  UNTIL (NEW  MAXIMUM  DISTANCE  OF 

K  NEAREST  NEIGHBORS  FOUND) 

DO  4  INDEX=1 ,K 

IF( DNEAR( INDEX) . GT . DNEAR(MAXNER) )  THEN 
MAXNER=INDEX 
END  IF 
CONTINUE 
END  DO  UNTIL 
END  IF 

END  IF 

CONTINUE 
END  DO  UNTIL 

DO  UNTIL(CLASS  OF  NEAREST  NEIGHBORS 
AND  "VTRAIN"  DETERMINED) 

DO  10  CLASS-1, KLASES 
COUNT (CL ASS )=0 
DO  7  INDEX-1, K 
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IF((NEAR( INDEX). GE. START(CLASS) ) . AND. 
(NEAR( INDEX) . LE. END(CLASS) ) )  THEN 
C0UNT(CLASS)=C0UNT(CLASS)+1 
END  IF 
CONTINUE 

IF( ( VTR AIN. GE. START (Cl ASS) ) .AND. 

( VTRAIN. LE. END(CLASS) ) )  THEN 
VCLASS=CLASS 
END  IF 
CONTINUE 
END  DO  UNTIL 

DO  UNTIL (MEMBERSHIP  FOR  VECTOR 

NUMBER  "VTRAIN”  ASSIGNED) 

DO  15  CLASS=1 » KLASES 

IFCCLASS. EQ . VCLASS)  THEN 
MFUNCT (CL ASS > VTRAIN )=0 .51+ 

( COUNT (CLASS )#0. 49) /K 

ELSE 

MFUNCT(CLASS» VTRAIN) =( COUNT (CLASS) *0 .49 )/K 
END  IF 
CONTINUE 
END  DO  UNTIL 

END  IF 

CONTINUE 
END  DO  UNTIL 

RETURN 
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tOPTXMIZE 

C 

C  A  SUBROUTINE  WHICH  INITIALIZES  THE  MEMBERSHIP 
C  ARRAY  USED  IN  A  FUZZY  CLASSIFICATION  ALGORITHM. 

C 

C  WRITTEN  BY:  MICHAEL  R.  GRAY 
C 

C  COMPLETED:  APR  84 
C 

C  CALLING  SEGUENCE(FROM  A  FORTRAN  ROUTINE): 

C  CALL  INITMFCVCOUNT , FFEAT , LFEAT, KLASES, HIGH, LOW) 

C 

C  INPUT  VARIABLESCNOT  CHANGED): 

C 

C  VCOUNT  -  THE  NUMBER  OF  VECTORS  IN  VECTOR ,  AN  INTEGER. 

C 

C  FFEAT  -  THE  FIRST  FEATURE  TO  BE  CONSIDERED 
C  IN  THE  DATA  VECTORS,  AN  INTEGER. 

C 

C  LFEAT  -  THE  LAST  FEATURE  TO  BE  CONSIDERED 
C  IN  THE  DATA  VECTORS,  AN  INTEGER. 

C 

C  HIGH  -  THE  REAL  VALUE  TO  USE  FOR  "HIGH"  MEMBERSHIP. 

C 

C  LOW  -  THE  REAL  VALUE  TO  USE  FOR  "LOW"  MEMBERSHIP. 

C 

C  X  -  A  REAL  ARRAY  OF  DIMENSION  (4,242),  DATA  SET 
C  TO  BE  CLASSIFIED  OR  CLUSTERED,  PASSED  IN 

C  LABELLED  FORTRAN  COMMON  "AREA1" 

C 

C  OUTPUT  VARIABLES: 

C 

C  MFUNCT  -  A  REAL  ARRAY  OF  DIMENSION  (3, 242), CONTAINS 
C  "KLASES"  CLASSES,  "VCOUNT"  VECTORS, 

C  PASSED  IN  LABELLED  FORTRAN  COMMON  "AREAS" 

C 

CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC 

C 

C  PSEUDO  -  CODE  SOLUTION 

C 

C  ENTER  INITMF 

C  COMPUTE  MEAN  OF  ENTIRE  DATA  SET 
C  FIND  SAMPLE  FARTHEST  FROM  SAMPLE 
C  MEAN  AND  ASSIGN  IT  AS  "FAR1" 

C  IF<  KLASES  >=  3  )  THEN 
C  FIND  SAMPLE  FARTHEST  FROM  "FAR1" 

C  AND  ASSIGN  IT  AS  "FAR2" 

C  END  IF 

C  ASSIGN  ALL  SAMPLES  WITH  "HIGH"  MEMBERSHIP  IN 
C  CLASS  2  AND  "LOW"  MEMBERSHIP  IN  ALL  OTHER  CLASSES 
C  IF(  CLASSES  <  =  2  )  THEN 

C  REASSIGN  SAMPLE  "FAR1"  WITH  "HIGH"  MEMBERSHIP  IN 
C  CLASS  1  AND  "LOW"  MEMBERSHIP  IN  ALL  OTHER  CLASSES 

C  ELSE  IF  (CLASSES  >=  3)  THEN 

C  REASSIGN  SAMPLE  "FAR1"  WITH  "HIGH"  MEMBERSHIP  IN 
C  CLASS  3  AND  "LOW"  MEMBERSHIP  IN  ALL  OTHER  CLASSES 

C  REASSIGN  SAMPLE  "FAR1"  WITH  "HIGH"  MEMBERSHIP  IN 
C  CLASS  1  AND  "LOW"  MEMBERSHIP  IN  ALL  OTHER  CLASSES 

C  END  IF 
C  RETURN 
C 

CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC 

c 

SUBROUTINE  INITMF( VCOUNT, FFEAT, LFEAT, KLASES, HIGH, LOW) 
C 

INTEGER  FFEAT , FETUR .VECTOR , VCOUNT , FARl , FAR2 
REAL  MFUNCT ( 3,242) ,X( 4, 242) ,SMEAN(4) ,LOW 
COMMON  /AREA3/MFUNCT  /AREA1/X 
C 

C  DO  UNTIL (SAMPLE  MEAN  VECTOR  SET  TO  ZERO) 

DO  1  FETUR=FFEAT, LFEAT 
SMEAN( FETUR) =0.0 
1  CONTINUE 

C  END  DO  UNTIL 
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C  DO  UNTIKALL  VECTOR  IN  DATA  SET  SUMMED) 

DO  3  VECT0R=1 >  VCOUNT 

C  DO  UNTILCVECTOR  NUMBER  "VECTOR"  INCLUDED  IN  SUM) 

DO  2  FETUR=FFEAT»LFEAT 
SME AN ( FETUR ) =SMEAN ( FETUR ) +X ( FETUR  >  VECTOR ) 
CONTINUE 
END  DO  UNTIL 
CONTINUE 
END  DO  UNTIL 

DO  UNTIL (ALL  COMPONENTS  OF  SAMPLE  MEAN  VECTOR 

DIVIDED  BY  COUNT  OF  VECTORS  IN  DATA  SET) 
DO  4  FETUR=FFEAT .LFEAT 
SMEANC  FETUR )=SME AN ( FETUR) /VCOUNT 
CONTINUE 
END  DO  UNTIL 

DMAX=0 . 0 

DO  UNTILCVECTOR  FARTHEST  FROM  SAMPLE  MEAN  FOUND) 
DO  6  VECT0R=1, VCOUNT 

DIST1=0 . 0 

DO  UNTIL(DISTANCE  SQUARED  FROM  CURRENT 
VECTOR  TO  SAMPLE  MEAN  FOUND) 

DO  5  FETUR=FFEAT, LFEAT 

TEMP=X( FETUR , VECTOR) -SMEANC  FETUR) 
DIST1=DIST1+TEMP*TEMP 
CONTINUE 
END  DO  UNTIL 

IF(DISTl.GT.DMAX)  THEN 
DMAX=DIST1 
FAR1=VECT0R 
END  IF 

CONTINUE 
END  DO  UNTIL 

IF(KLASES . GE. 3)  THEN 
DMAX=0 . 0 

DO  UNTIL (VECTOR  FARTHEST  FROM  THE 
ONE  FOUND  ABOVE  LOCATED) 

DO  8  VECT0R=1, VCOUNT 

DI5T1=0 . 0 

DO  UNTIKDISTANCE  SQUARED  FROM  CURRENT 
VECTOR  TO  VECOTR  "FAR1"  FOUND) 

DO  7  FETUR=FFEAT, LFEAT 
TEMP=X( FETUR , VECTOR ) -X  C  FETUR , FAR1 ) 
DIST1=DISTI+TEMP*TEMP 
CONTINUE 
END  DO  UNTIL 

IF(DISTl.GT.DMAX)  THEN 
DMAX=DIST1 
FAR2=VECT0R 
END  IF 

CONTINUE 
END  DO  UNTIL 
END  IF 

ASSIGN  MEMBERSHIP  FUNCTIONS  AS  FOLLOWS: 

DO  UNTILCALL  VECTORS  GIVEN  "HIGH" 

MEMBERSHIP  IN  CLASS  2) 

DO  9  VECT0R=1, VCOUNT 
MFUNCTCl .VECTOR )=LOW 
MFUNCT ( 2. VECTOR )=HIGH 
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MFUNCT ( 3  * VECT0R)=L0W 
CONTINUE 
END  DO  UNTIL 

IF(KLASES.LE.2)  THEN 

GIVE  VECTOR  NUMBER  "FAR1"  A  "HIGH" 
MEMBERSHIP  IN  CLASS  1 

MFUNCT ( 1 ,  FAR1 )=HIGH 
MFUNCT(2>  FAR1 )=L0U 
MFUNCT ( 3r  FAR1 )=LOU 

ELSE  IFCKLASES.GE.3)  THEN 

GIVE  VECTOR  NUMBER  "FAR1"  HIGH 
MEMBERSHIP  IN  CLASS  3 

MFUNCT (1»  FAR1 )=LOM 
MFUNCT ( 2 » FAR 1 )=LOW 
MFUNCT(3#FAR1)=HIGH 

GIVE  VECTOR  NUMBER  "FAR2"  HIGH 
MEMBERSHIP  IN  CLASS  1 
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^OPTIMIZE 

A  SUBROUTINE  WHICH  OUTPUTS  THE  MEMBERSHIP  FUNCTION 

ARRAY  SELECTED  VIA  VALUE  OF  "CHOICE"  AND  "OPTION". 

WRITTEN  BY:  MICHAEL  R.  GRAY 

COMPLETED:  MAY  84 

FILENAME:  MGMEMBPR.FTN 

CALLING  SEGUENCEIFROM  A  FORTRAN  ROUTINE): 

CALL  MEMBPRIKLASES,VCOUNT,LU, CHOICE, OPTION) 

INPUT  VARIABLES(NOT  CHANGED): 

KLASES  -  INTEGER  COUNT  OF  NUMBER  OF  CLASSES 

VCOUNT  -  INTEGER  COUNT  OF  SAMPLES  IN  DATA  SET 

LU  -  INTEGER  WHICH  HOLDS  THE  LOGICAL  UNIT 
NUMBER  USED  FOR  OUTPUT 

CHOICE  -  INTEGER  WHICH  SELECTS  WHICH  MEMBERSHIP 
FUNCTION  ARRAY  TO  OUTPUT 

OPTION  -  INTEGER  WHICH  SELECTS  WHETHER  TO  OUTPUT 

ENTIRE  MEMBERSHIP  ARRAY  "NMFUNC"  OR  ONLY  THE 
PORTION  WHICHINCLUDES  MISCLASSIFIED  SAMPLES 

MFUNCT  -  REAL  ARRAY  3  BY  242  WHICH  HOLDS  MEMBERSHIPS 
PRODUCED  EITHER  BY  A  CLUSTERING  ALGORITHM 
OR  DIRECT  ASSIGNMENT,  PASSED  IN  LABELLED 
FORTRAN  COMMON  "AREA3" 

NMFUNC  -  REAL  ARRAY  3  BY  242  WHICH  HOLDS  MEMBERSHIP 
ASSIGNMENTS  PRODUCED  BY  ONE  OF  THE 
CLASSIFER  ALGORITHMS,  PASSED  IN 
LABELLED  FORTRAN  COMMON  "AREA5" 

WRONG  -  INTEGERS  ARRAY  Or  DIMENSION  242  WHICH 

HOLDS  THE  INDICES  OF  MISCLASSIFIED  VECTORS, 
PASSES  IN  LABELLED  FORTRAN  COMMON  "AREA4" 

WRNCNT  -  INTEGER  COUNT  OF  MISCLASSIFIED  SAMPLES 
WHOSE  INDICES  ARE  STORED  IN  "WRONG", 

PASSED  IN  LABELLED  FORTRAN  COMMON  "AREA4" 

CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC 

PSEUDO  -  CODE 


ENTER  MEMBPR 

IFICHOICE  <=  2  )  THEN 

OUTPUT  HEADING  FOR  MEMBERSHIP  ARRAY 
PRODUCED  BY  CLUSTERING  ALGORITHM 
ELSE 

OUTPUT  HEADING  FOR  MEMBERSHIP  ARRAY 
PRODUCED  BY  CLASSIFICATION  ALGORITHM 
END  IF 

IFIOPTION  =  1  )  THEN 
IFICHOICE  <=  2)  THEN 

OUTPUT  MEMBERSHIP  ARRAY  PRODUCED 
BY  CLUSTERING  ALGORITHM 
ELSE 

OUTPUT  MEMBERSHIP  ARRAY  PRODUCED 
BY  CLASSIFICATION  ALGORITHM 
END  IF 
ELSE 

IFICHOICE  <=  2)  THEN 

OUTPUT  MEMBERSHIPS  OF  SAMPLES  MISCLASSIFIED 
BY  CLUSTERING  ALGORITHM 
ELSE 

OUTPUT  MEMBERSHIPS  OF  SAMPLES  MISCLASSIFIED 
BY  CLASSIFICATION  ALGORITHM 
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C  END  IF 

C  END  IF 
C  RETURN 
C 

cccccccccccccccccccccccccccccccccccccccccccccccccccccccccccc 

c 

SUBROUTINE  MEMBPR(KLASES, VCOUNT, LU, CHOICE, OPTION) 

C 

REAL  MFUNCT (3,242) , NMFUNCC  3, 242) 

INTEGER  VCOUNT, CL ASS, CHOICE, OPTION, WRNCNT, COUNT 
INTEGERS  WRONG( 242 ) 

COMMON  /AREA3/MFUNCT  /AREA4/WRNCNT, WRONG 
COMMON  /AREA5/NMFUNC 
C 

IFCCH0ICE.LE.2)  THEN 
WRITE( LU, 1 ) 

1  F0RMAT(/,43X,  'PROTOTYPE  DATA  MEMBERSHIP’ 

1  ’  FUNCTION  ARRAY’) 

ELSE 

WRITECLU,2) 

2  FORMAT!/, 43X, ’NEWLY  ASSIGNED  MEMBERSHIP* 

1  ’  FUNCTION  ARRAY’) 

END  IF 

IF(OPTION.EO.l)  THEN 
WRITE<LU,3) 

3  FORMAT ( ’  ’CLASS’) 

DO  11  INDEX=1, VCOUNT, 20 

IF( ( INDEX+20 ) .GT. VCOUNT)  THEN 
COUNT=VCOUNT 
ELSE 

C0UNT=INDEX+19 
END  IF 

WRITE<LU,4)  ( I , I=INDEX, COUNT ) 

4  FORMAT  C ’  ’ ,6X, ’X* , 1X,20<2X, 13, IX) ) 

DO  8  CLASS=1 , KLASES 

IF(CHOICE. LE.2)  THEN 

MRITE(LU,7)  CLASS, (MFUNCT (CL ASS, I) , 

1  I=INDEX, COUNT) 

7  FORMAT ( *  ’ , I3,4X, 20F6 . 3) 

ELSE 

WRITE(LU,7)  CLASS, (NMFUNCtCLASS, I ) , 
l  I=INDEX, COUNT) 

END  IF 

8  CONTINUE 

11  CONTINUE 
C 

ELSE 

C 

WRITECLU.12) 

12  FORMAT (/, 5 OX, ’MISCL ASSIFIED  VECTORS  ONLY’) 

WRITE!  LU, 3) 

C 

DO  18  INDEX=1 * WRNCNT ,20 

IFI!INDEX+20).GT. WRNCNT)  THEN 
COUNT=WRNCNT 
ELSE 

C0UNT=INDEX+19 
END  IF 

WRITE(LU,4)  ( WRONG ( I ) , I=INDEX, COUNT) 

DO  14  CLASS=1, KLASES 
IF( CHOICE . L E . 2 )  THEN 

WRITE( LU, 7 )  CLASS, (MFUNCT(CLASS, 

1  WRONG(I)),I=INDEX, COUNT) 

ELSE 

WRITE(LU,7)  CLASS, ( NMFUNCC CL ASS , 

1  WRONG(I)),I=INDEX, COUNT) 

END  IF 

14  CONTINUE 

18  CONTINUE 

END  IF 
C 

RETURN 

END 
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♦OPTIMIZE 

A  SUBROUTINE  WHICH  COMPUTES  THE  UPPER  AND 
LOWER  CUT-SETS  OF  A  FUZZY  MEMBERSHIP 
FUNCTION  ARRAY  AND  OUTPUTS  THE  COUNTS  OF 
SAMPLE  VECTORS  IN  THE  RESULTING  CUT-SETS. 

WRITTEN  BY:  MICHAEL  R.  GRAY 

COMPLETED:  APR  84 

FILENAME:  MGCUTSET. FTN 

CALLING  SEQUENCECFROM  A  FORTRAN  ROUTINE): 

CALL  CUTSET( ALPHA, BETA, KLASES.VCOUNT, CHOICE, LU) 

INPUT  VARIABLESINOT  CHANGED): 

ALPHA  -  REAL  VALUE  OF  THE  UPPER  MEMBERSHIP 
LIMIT  FOR  THE  CUT-SET  TO  BE  ASIGNED 

BETA  -  REAL  VALUE  OF  THE  LOWER  MEMBERSHIP 
LIMIT  FOR  THE  CUT-SET  TO  BE  ASIGNED 

KLASES  -  INTEGER  COUNT  OF  NUMBER  OF  CLASSES 

VCOUNT  -  INTEGER  COUNT  OF  NUMBER  OF  SAMPLE  VECTORS 

CHOICE  -  INTEGER  WHICH  CHOOSES  BETWEEN  CUT-SETS 
OF  MEMBERSHIPS  IN  ARRAY  ASSIGNED  BY 
CLUSTERING  OR  CLASSIFICATION  ALGORITHM 

LU  -  INTEGER  VALUE  WHICH  SETS  THE  LOGICAL 
UNIT  FOR  OUTPUT 

MFUNCT  -  REAL  ARRAY  3  BY  242  WHICH  HOLDS  THE 
MEMBERSHIPS  COMPUTED  BY  A  CLUSTERING 
ALGORITHM,  PASSED  IN  LABELLED 
FORTRAN  COMMON  "AREA3" 

NMFUNC  -  REAL  ARRAY  3  BY  242  WHICH  HOLDS  THE 

MEMBERSHIPS  COMPUTED  BY  A  CLASSIFICATION 
A  CLASSIFICATION  ALGORITHM,  PASSED  IN 
LABELLED  FORTRAN  COMMON  "AREA5" 

WRONG  -  INTEGERK2  ARRAY  OF  DIMENSION  242  WHICH  HOLDS 
THE  INDICES  OF  MISCLASSIFIED  SAMPLES,  PASSED 
IN  LABELLED  FORTRAN  COMMON  "AREA4" 

WRNCNT  -  INTEGER  WHICH  SPECIFIES  NUMBER  OF 

MISCLASSIFIED  SAMPLES  INDICES  IN  "WRONG", 
PASSED  IN  LABELLED  FORTRAN  COMMON  "AREA4" 

OUTPUT  VARIABLES: 

KALPHA  -  INTEGERS  ARRAY  3  BY  242  WHICH  HOLDS 
INIDCES  OF  SAMPLES  WITH  MEMBERSHIP 
>  "ALPHA"  FOR  EACH  CLASS 

ACOUNT  -  INTEGER  ARRAY  OF  DIMENSION  3  WHICH  HOLDS 

COUNT  OF  SAMPLES  IN  "KALPHA"  FOR  EACH  CLASS 

KBETA  -  INTEGERS  ARRAY  3  BY  242  WHICH  HOLDS 
INDICES  OF  SAMPLES  WITH  MEMBERSHIP 
<  "BETA"  FOR  EACH  CLASS 

BCOUNT  -  INTEGER  ARRAY  OF  DIMENSION  3  WHICH  HOLDS 

COUNT  OF  SAMPLES  IN  "KBETA"  FOR  EACH  CLASS 

BETWEN  -  INTEGERK2  ARRAY  3  BY  242  WHICH  HOLDS  INDICES 
OF  SAMPLES  WITH  MEMBERSHIP  >=  "BETA"  AND 
<=  "ALPHA"  FOR  EACH  CLASS 

ICOUNT  -  INTEGER  ARRAY  OF  DIMENSION  3  WHICH  HOLDS 

COUNT  OF  SAMPLES  IN  "BETWEN"  FOR  EACH  CLASS 
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c 

cccccccccccccccccccccccccccccccccccccccccccccccccccccccccccc 

PSEUDO  -  CODE  SOLUTION 


ENTER  CUTSET 

PROMPT  USER  TO  CHOOSE  IF  ALL  SAMPLES  TO  BE 
TESTED  ("ICHOICE"=l)  OR  ONLY  MISCLASSIFIED 
SAMPLES  TO  BE  TESTEDC"ICH0ICE"=2) 

IF!ICH0ICE=2)  THEN 

SET  SAMPLE  COUNT  TO  "WRNCNT" 

ELSE 

SET  SAMPLE  COUNT  TO  "VCOUNT" 

END  IF 

DO  UNTIL!  ALL  CLASSES  CONSIDERED) 

IF(  CHOICE  <=2  )  THEN 

GET  INDICES  AND  COUNT  OF  SAMPLES  WITH 
MEMBERSHIPS  >  "ALPHA"  IN  CURRENT 
CLASS  USING  "MFUNCT"  MEMBERSHIPS 
GET  INDICES  AND  COUNT  OF  SAMPLES  WITH 
MEMBERSHIPS  <  "BETA"  IN  CURRENT 
CLASS  USING  "MFUNCT"  MEMBERSHIPS 
GET  INDICES  AND  COUNT  OF  SAMPLES  UITH 
MEMBERSHIP  <=  "ALPHA"  AND  >=  "BETA  IN 
CURRENT  CLASS  USING  "MFUNCT"  MEMBERSHIPS 
ELSE 

GET  INDICES  AND  COUNT  OF  SAMPLES  UITH 
MEMBERSHIPS  >  "ALPHA"  IN  CURRENT 
CLASS  USING  "NMFUNC"  MEMBERSHIPS 
GET  INDICES  AND  COUNT  OF  SAMPLES  UITH 
MEMBERSHIPS  <  "BETA"  IN  CURRENT 
CLASS  USING  "NMFUNC"  MEMBERSHIPS 
GET  INDICES  AND  COUNT  OF  SAMPLES  UITH 
MEMBERSHIPS  <=  "ALPHA"  AND  >=  "BETA"  IN 
CURRENT  CLASS  USING  "NMFUNC"  MEMBERSHIPS 
END  IF 
END  DO  UNTIL 

OUTPUT  TO  "LU"  THE  COUNTS  OF  SAMPLES  IN  THE 
UPPER,  LOUER,  AND  INNER  CUT-SETS 
RETURN 

CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC 

SUBROUTINE  CUTSET! ALPHA, BETA, KLASES, VCOUNT, CHOICE, LU) 

REAL  MFUNCT! 3, 242) , NMFUNC! 3, 242) 

INTEGER  VCOUNT , CLASS , VECTOR , CHOICE 
INTEGER  ACOUNT ! 3) , BCOUNT ! 3) , ICOUNT ! 3) , WRNCNT 
INTEGERK2  KALPHA!3,242) ,KBETA!3,242) .BETUEN! 3,242) 
INTEGERS  WR0NG!242) 

COMMON  /AREA3/MFUNCT  /AREA4/URNCNT ,URONG  /AREA5/NMFUNC 
COMMON  /AREA8/AC0UNT , BCOUNT , ICOUNT , KALPHA, KBETA , BETUEN 

URITE!5, 1 ) 

FORMAT!/, 2X, 'ENTER  YOUR  CHOICE: • ,//,5X, 

1*1  -  CONSIDER  ALL  VECTORS' ,/,5X, 

2*2  -  CONSIDER  ONLY  MISCLASSIFIED  VECTORS’) 

READ!5,2)  ICHOICE 
FORMAT!  ID 

I F! ICHOICE. EQ. 2)  THEN 
NUMBER=URNCNT 
ELSE 

NUMBER=VCOUNT 
END  IF 

DO  12  CLASS=1,KLASES 
ACOUNT! CLASS )=0 
BCOUNT! CLASS )=0 
I COUNT! CL ASS )=0 

IF!CHOICE. IE. 2)  THEN 

DO  UNTIL!CHOSEN  VECTORS  OF  DATA  SET 
FOR  CURRENT  CLASS  CHECKED) 
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4 

C 

C 

c 


6 

C 

C 

12 

C 

C 


13 


14 


C 


15 

16 


17 

18 


19 

20 
C 


DO  4  INDEX=1, NUMBER 
IF( I CHOICE. EQ. 2)  THEN 
VECTOR=WRONG( INDEX) 

ELSE 

VECTOR=INDEX 
END  IF 

I F(MFUNCT( CLASS* VECTOR ) . GT. ALPHA)  THEN 
ACOUNTCCL ASS )=ACOUNT (CLASS )+l 
K ALPHA (CL ASS, ACOUNT (CLASS) )=V ECTOR 
ELSE  IF(MFUNCT(CLASS, VECTOR). LT. BETA)  THEN 
BCOUNT(CLASS)=BCOUNT (CLASS)+1 
KB ETA( CLASS, BCOUNT (CLASS) )=V ECTOR 
ELSE 

ICOUNT(CLASS)=ICOUNT(CL ASS )+l 
BETWEN( CL ASS , I COUNT (CLASS ) )=VECTOR 
END  IF 
CONTINUE 
END  DO  UNTIL 
ELSE 

DO  UNTIL(ALL  VECTORS  OF  TEST  SET 

FOR  CURRENT  CLASS  CHECKED) 

DO  6  INDEX=1, NUMBER 
IF(I CHOICE . EQ . 2 )  THEN 
VECTOR=WRONG( INDEX) 

ELSE 

VECTOR=INDEX 
END  IF 

IF(NMFUNC(CLASS, VECTOR). GT. ALPHA)  THEN 
ACOUNT(CLASS)=ACOUNT(CLASS)+l 
KALPHA(CL ASS, ACOUNTt CLASS) )=VECTOR 
ELSE  IF(NMFUNC(CLASS, VECTOR). LT. BETA)  THEN 
BCOUNT(CLASS)=BCOUNT(CLASS)+l 
KB ET A ( CL ASS, BCOUNT (CLASS) )=V ECTOR 
ELSE 

ICOUNT(CLASS)=ICOUNT(CLASS)+l 
BETWENtCLASS, ICOUNTCCLASS) )=VECTOR 
END  IF 
CONTINUE 
END  DO  UNTIL 
END  IF 

CONTINUE 
END  DO  UNTIL 

IFdCHOICE.  EQ.2)  THEN 
WRITECLU, 13 ) 

FORMATC/,'  THE  FOLLOWING  CONSIDERS  * 

1  ’MISCLASSIFIED  VECTORS.’) 

ELSE 

WRITE(LU, 14) 

FORMAT(/,’  THE  FOLLOWING  CONSIDERS  ' 

1  ’ALL  VECTORS  IN  DATA  SET.’) 

END  IF 

DO  16  ICLASS=1,KLASES 

WRITE(LU, 15)  ACOUNT( ICLASS), ALPHA, ICLASS 
F0RMAT(/,4X,I3, '  VECTORS  WITH  MEMBERSHIP  >  ’, 
1F5.2, ’  IN  CLASS’, 12) 

CONTINUE 

DO  18  ICLASS=1,KLASES 

WRITE( LU , 17 )  BCOUNT (ICLASS), BETA, ICLASS 
FORMATt/, 4X, 13, ’  VECTORS  WITH  MEMBERSHIP  < 
1F5.2, ’  IN  CLASS’, 12) 

CONTINUE 

DO  20  ICLASS=1,KLASES 

WRITECLU, 19)  ICOUNT( ICLASS), ALPHA, BETA, ICLASS 
FORMAT (/, 4X, 13 , ’  VECTORS  WITH  MEMBERSHIP  >=  ’, 
1F5.2, ’  AND  <=  *  »F5.2» ’  IN  CLASS’, 12) 

CONTINUE 


RETURN 

END 
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OOPTIM1ZE 

A  SUBROUTINE  WHICH  COMPUTES  AND  OUTPUTS 
A  CONFUSION  MATRIX  OF  THE  HARD  PARTITION 
WHICH  RESULTS  FROM  ASSIGNING  A  DATA 
VECTOR  TO  A  CLASS  IN  WHICH  IT  HAS 
MEMBERSHIP  FUNCTION  VALUE. 

WRITTEN  BY:  MICHAEL  R.  GRAY 

COMPLETED: APR  84 

FILENAME:  MGCMTRIX. FTN 

CALLING  SEQUENCE! FROM  A  FORTRAN  ROUTINE): 

CALL  CMTRIX! KL ASES, VCOUNT , LU , CHOICE) 

INPUT  VARIABLESCNOT  CHANGED): 

KLASES  -  INTEGER  COUNT  OF  NUMBER  OF  CLASSES 

VCOUNT  -  INTEGER  COUNT  OF  NUMBER  OF  DATA  SAMPLES 

LU  -  INTEGER  WHICH  HOLDS  LOGICAL  UNIT 
NUMBER  FOR  OUTPUT 

CHOICE  -  INTEGER  WHICH  SELECTS  WHETHER  TO  CONSIDER 
MEMBERSHIPS  PRODUCED  BY  CLUSTERING 
ALGORITHMS)  OR  CLASSIFICATION  ALGORITHMS ) 

MFUNCT  -  REAL  ARRAY  3  BY  242  WHICH  HOLDS  MEMBERSHIPS 
ASSIGNED  BY  CLUSTERING  ALGORITHM,  PASSED 
IN  LABELLED  FORTRAN  COMMON  "AREA3" 

NMFUNC  -  REAL  ARRAY  3  BY  242  WHICH  HOLDS  MEMBERSHIPS 
ASSIGNED  BY  CLASSIFICATION  ALGORITHM,  PASSED 
IN  LABELLED  COMMON  "AREAS" 

OUTPUT  VARIABLES: 

WRONG  -  INTEGERS  ARRAY  OF  DIMENSION  WHICH  HOLDS 
INDICES  OF  MISCLASSIFIED  SAMPLES,  PASSED 
IN  LABELLED  FORTRAN  COMMON  "AREA4" 

WRNCNT  -  INTEGER  WHICH  HOLDS  THE  COUNT  OF 
MISCLASSIFIED  SAMPLES,  PASSED  IN 
IN  LABELLED  FORTRAN  COMMON  "AREA4" 

CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC 

PSEUDO  -  CODE 


ENTER  CMTRIX 
INITIALIZE  "WRNCNT"  TO  0 
INITIALIZE  VECTOR  INDEX  TO  1 

DO  UNTIL (CLASSIFICATION  OF  ALL  SAMPLES  DETERMINED 
DETERMINE  ACTUAL  CLASS  OF  CURRENT  VECTOR 
IFCCHOICE  <=  2)  THEN 

DETERMINE  CLASS  OF  MAXIMUM  MEMBERSHIP 
FOR  CURRENT  VECTOR  FROM  "MFUNCT"  ARRAY 
ELSE 

DETERMINE  CLASS  OF  MAXIMUM  MEMBERSHIP 
FOR  CURRENT  VECTOR  FROM  "NMFUNC"  ARRAY 
END  IF 

INCREMENT  CONFUSION  MATRIX  ELEMENT  (ACTUAL, MAXIMUM) 
IF(ACTUAL  NOT  EQUAL  MAXIMUM)  THEN 
INCREMENT  "WRNCNT" 

PUT  INDEX  OF  MISCLASSIFIED  SAMPLE  IN  "WRONG"  ARRAY 
END  IF 

INCREMENT  VECTOR  INDEX 
END  DO  UNTIL 

OUTPUT  RESULTANT  CONFUSION  MATRIX 
RETURN 
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CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC 

SUBROUTINE  CMTRIX(KLASES, VCOUNT, LU, CHOICE) 

REAL  MFUNCT(3,242),NMFUNC(3,242) 

INTEGER  CARRAY(3,3), VCOUNT , VECTOR .CLASS 
INTEGER  ACTUAL, CHOICE, WRNCNT, START ( 3). END<3) 

INTEGERS  WR0NG(242) 

COMMON  /AREA3/MFUNCT  /AREA5/NMFUNC 
COMMON  /AREA6/START , END  /AREA4/WRNCNT. WRONG 

DO  UNTIL (CONFUSION  MATRIX  "ZEROED”) 

DO  2  1=1 , KLASES 
DO  1  J=l, KLASES 
CARRAYd,  J)=0 
CONTINUE 
CONTINUE 
END  DO  UNTIL 

ERRMAX=0 . 0 
URNCNT=0 

DO  UNTIL( ALL  VECTORS  ASSIGNED  TO  A  CLASS) 

DO  4  VECT0R=1, VCOUNT 

DMAX=0 . 0 

DO  3  CLASS=1, KLASES 

FIND  ACTUAL  CLASS  VECTOR  BELONGS  IN 

IF(( VECTOR. GE.START(CLASS)). AND. 

1  (VECTOR. LE.END(CLASS)))  THEN 

ACTUAL=CLASS 
END  IF 

FIND  CLASS  OF  MAXIMUM  MEMBERSHIP 

IF(CH0ICE.LE.2)  THEN 

I F(MFUNCT (CLASS .VECTOR ) . GT . DMAX)  THEN 
DMAX=MFUNCT( CLASS, VECTOR) 

NUMBER=CLASS 
END  IF 

ELSE 

IF(NMFUNC(CLASS, VECTOR). GT. DMAX)  THEN 
DMAX=NMFUNC( CL ASS, VECTOR) 

NUMBER=CLASS 
END  IF 

END  IF 

CONTINUE 

CARRAY( ACTUAL , NUMBER )=CARRAY( ACTUAL , NUMBER) +1 

IF(NUMBER.NE. ACTUAL)  THEN 
WRNCNT=WRNCNT+1 
URONG( WRNCNT )=VECTOR 
END  IF 

CONTINUE 
END  DO  UNTIL 

WRITEUU.5) 

5  FORMAT(/, '  THE  HARD  PARTITION  SHOWN  IN  THE  ’ 

1 ’CONFUSION  MATRIX  WAS  CONSTRUCTED  USING  MAXIMUM  ' 
2’MEMBERSHIP  VALUE  FOR  EACH  CLASS. ’ ) 

C 

WRITE(LU,6)  KLASES, KLASES 

6  FORMAT C  • ,7X, 'CONFUSION  MATRIX:  ROWS  1-’,I2, 

1*  SHOW  CLASSIFICATION  OF  CLASSES  1-*,I2, 

2* ,  ’ ,  ’  RESPECTIVELY.*) 
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